ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	5

Descriptor

Error of Measurement	8
Evaluators	8
Models	8
Data Analysis	3
Item Response Theory	3
Accuracy	2
Classification	2
Interrater Reliability	2
Measurement	2
Scores	2
Simulation	2
Bias	1
Comparative Analysis	1
Educational Assessment	1
Elementary School Students	1
Equated Scores	1
Equations (Mathematics)	1
Evaluation Methods	1
Goodness of Fit	1
Hierarchical Linear Modeling	1
Item Analysis	1
Learner Engagement	1
Longitudinal Studies	1
Mathematical Formulas	1
Measurement Techniques	1
More ▼

Source

Journal of Educational…	2
Applied Measurement in…	1
Canadian Journal of Program…	1
Educational and Psychological…	1
Journal of Educational and…	1
Psychometrika	1
Society for Research on…	1

Author

Batchelder, William H.	1
Carl Westine	1
Conger, Anthony J.	1
Cox, Kyle	1
Evans, Brian	1
Hoskens, Machteld	1
Kelcey, Ben	1
Klauer, Karl Christoph	1
Lee, Won-Chan	1
Michelle Boyer	1
Sebok-Syer, Stefanie S.	1
Song, Yoon Ah	1
Stella Y. Kim	1
Tong Wu	1
Wang, Shanshan	1
Wilson, Mark	1
Wind, Stefanie A.	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	5
Reports - Evaluative	2
Reports - Descriptive	1

Education Level

Elementary Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 8 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Examining Differential Rater Functioning Using a Between-Subgroup Outfit Approach

Peer reviewed

Direct link

Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019

When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…

Descriptors: Measurement, Models, Evaluators, Simulation

Kappa and Rater Accuracy: Paradigms and Parameters

Peer reviewed

Direct link

Conger, Anthony J. – Educational and Psychological Measurement, 2017

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…

Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis

High-Dimensional Explanatory Random Item Effects Models for Rater-Mediated Assessments

Peer reviewed
PDF on ERIC

Download full text

Kelcey, Ben; Wang, Shanshan; Cox, Kyle – Society for Research on Educational Effectiveness, 2016

Valid and reliable measurement of unobserved latent variables is essential to understanding and improving education. A common and persistent approach to assessing latent constructs in education is the use of rater inferential judgment. The purpose of this study is to develop high-dimensional explanatory random item effects models designed for…

Descriptors: Test Items, Models, Evaluators, Longitudinal Studies

The Rater Bundle Model.

Peer reviewed

Wilson, Mark; Hoskens, Machteld – Journal of Educational and Behavioral Statistics, 2001

Introduces the Rater Bundle Model, an item response model for repeated ratings of student work. Applies the model to real and simulated data to illustrate the approach, which was motivated by the observation that when repeated ratings occur, the assumption of conditional independence is violated, and current item response models can then…

Descriptors: Error of Measurement, Evaluators, Item Response Theory, Models

Structural Analysis of Subjective Categorical Data.

Peer reviewed

Klauer, Karl Christoph; Batchelder, William H. – Psychometrika, 1996

A general approach to the analysis of nominal-scale ratings is discussed that is based on a simple measurement error model for a rater's judgments. The basic measurement error model gives rise to an agreement model for the agreement matrix of two or more raters. (SLD)

Descriptors: Classification, Data Analysis, Equations (Mathematics), Error of Measurement

On the Difference between Reliability of Measurement and Precision of Survey Instruments.

Peer reviewed

Evans, Brian – Canadian Journal of Program Evaluation/La Revue canadienne d'evaluation de programme, 1995

The distinction between two models of reliability is clarified. Reliability may be conceived of and estimated from a true score model or from the perspective of sampling precision. Basic models are developed and illustrated for each approach using data from the author's work on measuring organizational climate. (SLD)

Descriptors: Data Analysis, Error of Measurement, Evaluators, Models