ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	4

Descriptor

Evaluators	4
Models	4
Error of Measurement	2
Evaluation Methods	2
Item Response Theory	2
Bias	1
Computation	1
Correlation	1
Data Analysis	1
Data Collection	1
Equated Scores	1
Error Patterns	1
Goodness of Fit	1
Hierarchical Linear Modeling	1
Interrater Reliability	1
Measurement	1
Rating Scales	1
Research Design	1
Robustness (Statistics)	1
Simulation	1
Test Validity	1
More ▼

Source

Journal of Educational…

Author

Wind, Stefanie A.	2
Carl Westine	1
Jones, Eli	1
Michelle Boyer	1
Qiu, Xue-Lan	1
Sebok-Syer, Stefanie S.	1
Stella Y. Kim	1
Su, Chi-Ming	1
Tong Wu	1
Wang, Wen-Chung	1

Publication Type

Journal Articles	4
Reports - Research	4

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Examining Differential Rater Functioning Using a Between-Subgroup Outfit Approach

Peer reviewed

Direct link

Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019

When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…

Descriptors: Measurement, Models, Evaluators, Simulation

The Effects of Incomplete Rating Designs in Combination with Rater Effects

Peer reviewed

Direct link

Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019

Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…

Descriptors: Rating Scales, Models, Evaluators, Data Collection

Item Response Models for Local Dependence among Multiple Ratings

Peer reviewed

Direct link

Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan – Journal of Educational Measurement, 2014

Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…

Descriptors: Item Response Theory, Interrater Reliability, Models, Correlation