Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 4 |
Descriptor
Evaluators | 4 |
Models | 4 |
Error of Measurement | 2 |
Evaluation Methods | 2 |
Item Response Theory | 2 |
Bias | 1 |
Computation | 1 |
Correlation | 1 |
Data Analysis | 1 |
Data Collection | 1 |
Equated Scores | 1 |
More ▼ |
Source
Journal of Educational… | 4 |
Author
Wind, Stefanie A. | 2 |
Carl Westine | 1 |
Jones, Eli | 1 |
Michelle Boyer | 1 |
Qiu, Xue-Lan | 1 |
Sebok-Syer, Stefanie S. | 1 |
Stella Y. Kim | 1 |
Su, Chi-Ming | 1 |
Tong Wu | 1 |
Wang, Wen-Chung | 1 |
Publication Type
Journal Articles | 4 |
Reports - Research | 4 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019
When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…
Descriptors: Measurement, Models, Evaluators, Simulation
Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019
Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…
Descriptors: Rating Scales, Models, Evaluators, Data Collection
Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan – Journal of Educational Measurement, 2014
Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…
Descriptors: Item Response Theory, Interrater Reliability, Models, Correlation