Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 2 |
Descriptor
Error of Measurement | 3 |
Sample Size | 3 |
Scoring | 3 |
Equated Scores | 2 |
Test Items | 2 |
Ability | 1 |
Accuracy | 1 |
Audiotape Recordings | 1 |
Comparative Analysis | 1 |
Differences | 1 |
Generalizability Theory | 1 |
More ▼ |
Author
Brennan, Robert L. | 1 |
Kim, Sooyeon | 1 |
Livingston, Samuel A. | 1 |
Puhan, Gautam | 1 |
Ricker, Kathryn L. | 1 |
Tan, Xuan | 1 |
Publication Type
Journal Articles | 2 |
Reports - Research | 2 |
Reports - Evaluative | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam – Educational Testing Service, 2010
This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…
Descriptors: Equated Scores, Scoring, Responses, Test Items
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009
A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…
Descriptors: Sampling, Sample Size, Accuracy, Test Items

Brennan, Robert L.; And Others – Educational and Psychological Measurement, 1995
Generalizability theory is used to examine the psychometric characteristics of the Listening and Writing Tests developed by American College Testing for its Work Keys program. Results with samples of 50 suggest the desirability of a minimum number of the tests' tape-recorded messages and the use of at least 2 raters. (SLD)
Descriptors: Audiotape Recordings, Error of Measurement, Generalizability Theory, Interaction