ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	3

Descriptor

Error of Measurement	4
Evaluators	4
Item Response Theory	2
Models	2
Adults	1
Answer Keys	1
Bias	1
Cutting Scores	1
Data Analysis	1
Decision Making	1
Difficulty Level	1
Equated Scores	1
Evaluation Methods	1
Generalization	1
Guidelines	1
Hierarchical Linear Modeling	1
Item Analysis	1
Licensing Examinations…	1
Measurement	1
Physicians	1
Probability	1
Robustness (Statistics)	1
Scoring	1
Simulation	1
Standard Setting	1
More ▼

Source

Journal of Educational…

Author

Carl Westine	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Kane, Michael	1
Michelle Boyer	1
Norcini, John J.	1
Sebok-Syer, Stefanie S.	1
Stella Y. Kim	1
Tong Wu	1
Wind, Stefanie A.	1

Publication Type

Journal Articles	4
Reports - Research	3
Reports - Evaluative	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Examining Differential Rater Functioning Using a Between-Subgroup Outfit Approach

Peer reviewed

Direct link

Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019

When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…

Descriptors: Measurement, Models, Evaluators, Simulation

Examining the Precision of Cut Scores within a Generalizability Theory Framework: A Closer Look at the Item Effect

Peer reviewed

Direct link

Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020

An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…

Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting

The Answer Key as a Source of Error in Examinations for Professionals.

Peer reviewed

Norcini, John J. – Journal of Educational Measurement, 1987

Answer keys for physician and teacher licensing examinations were studied. The impact of variability on total errors of measurement was examined for answer keys constructed using the aggregate method. Results indicated that, in some cases, scorers contributed to a sizable reduction in measurement error. (Author/GDC)

Descriptors: Adults, Answer Keys, Error of Measurement, Evaluators