Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 5 |
Descriptor
Error of Measurement | 8 |
Evaluators | 8 |
Models | 8 |
Data Analysis | 3 |
Item Response Theory | 3 |
Accuracy | 2 |
Classification | 2 |
Interrater Reliability | 2 |
Measurement | 2 |
Scores | 2 |
Simulation | 2 |
More ▼ |
Source
Journal of Educational… | 2 |
Applied Measurement in… | 1 |
Canadian Journal of Program… | 1 |
Educational and Psychological… | 1 |
Journal of Educational and… | 1 |
Psychometrika | 1 |
Society for Research on… | 1 |
Author
Batchelder, William H. | 1 |
Carl Westine | 1 |
Conger, Anthony J. | 1 |
Cox, Kyle | 1 |
Evans, Brian | 1 |
Hoskens, Machteld | 1 |
Kelcey, Ben | 1 |
Klauer, Karl Christoph | 1 |
Lee, Won-Chan | 1 |
Michelle Boyer | 1 |
Sebok-Syer, Stefanie S. | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 5 |
Reports - Evaluative | 2 |
Reports - Descriptive | 1 |
Education Level
Elementary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019
When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…
Descriptors: Measurement, Models, Evaluators, Simulation
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Kelcey, Ben; Wang, Shanshan; Cox, Kyle – Society for Research on Educational Effectiveness, 2016
Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…
Descriptors: Test Items, Models, Evaluators, Longitudinal Studies

Wilson, Mark; Hoskens, Machteld – Journal of Educational and Behavioral Statistics, 2001
Introduces the Rater Bundle Model, an item response model for repeated ratings of student work. Applies the model to real and simulated data to illustrate the approach, which was motivated by the observation that when repeated ratings occur, the assumption of conditional independence is violated, and current item response models can then…
Descriptors: Error of Measurement, Evaluators, Item Response Theory, Models

Klauer, Karl Christoph; Batchelder, William H. – Psychometrika, 1996
A general approach to the analysis of nominal-scale ratings is discussed that is based on a simple measurement error model for a rater's judgments. The basic measurement error model gives rise to an agreement model for the agreement matrix of two or more raters. (SLD)
Descriptors: Classification, Data Analysis, Equations (Mathematics), Error of Measurement

Evans, Brian – Canadian Journal of Program Evaluation/La Revue canadienne d'evaluation de programme, 1995
The distinction between two models of reliability is clarified. Reliability may be conceived of and estimated from a true score model or from the perspective of sampling precision. Basic models are developed and illustrated for each approach using data from the author's work on measuring organizational climate. (SLD)
Descriptors: Data Analysis, Error of Measurement, Evaluators, Models