ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	6

Descriptor

Error of Measurement	11
Evaluation Methods	11
Item Response Theory	6
Test Bias	4
Models	3
Sample Size	3
Test Validity	3
Cutting Scores	2
Elementary Secondary Education	2
Evaluation Research	2
Foreign Countries	2
Item Analysis	2
Mathematical Models	2
National Competency Tests	2
Simulation	2
Statistical Bias	2
Test Format	2
Test Items	2
Test Reliability	2
Academic Standards	1
Accuracy	1
Achievement Tests	1
Bias	1
Classification	1
Comparative Analysis	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	10
Reports - Evaluative	5
Reports - Research	5

Education Level

Elementary Secondary Education	1
Secondary Education	1

Audience

Location

United Kingdom (Scotland)

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Classification Consistency and Accuracy with Atypical Score Distributions

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020

The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…

Descriptors: Classification, Accuracy, Scores, Cutting Scores

Optimal Linking Design for Response Model Parameters

Peer reviewed

Direct link

Barrett, Michelle D.; van der Linden, Wim J. – Journal of Educational Measurement, 2017

Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…

Descriptors: Item Response Theory, Error of Measurement, Programming, Evaluation Methods

Maintaining Equivalent Cut Scores for Small Sample Test Forms

Peer reviewed

Direct link

Dwyer, Andrew C. – Journal of Educational Measurement, 2016

This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…

Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards

A Comparison of Linking Methods for Estimating National Trends in International Comparative Large-Scale Assessments in the Presence of Cross-national DIF

Peer reviewed

Direct link

Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016

Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…

Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation

The Performance of a Method for the Long-Term Equating of Mixed-Format Assessment

Peer reviewed

Direct link

Kamata, Akihito; Tate, Richard – Journal of Educational Measurement, 2005

The goal of this study was the development of a procedure to predict the equating error associated with the long-term equating method of Tate (2003) for mixed-format tests. An expression for the determination of the error of an equating based on multiple links using the error for the component links was derived and illustrated with simulated data.…

Descriptors: Computer Simulation, Item Response Theory, Test Format, Evaluation Methods

Effect of Unequal Variances in Proficiency Distributions on Type-I Error of the Mantel-Haenszel Chi-Square Test for Differential Item Functioning

Peer reviewed

Direct link

Monahan, Patrick O.; Ankenmann, Robert D. – Journal of Educational Measurement, 2005

Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no…

Descriptors: Sample Size, Item Response Theory, Test Items, Test Bias

Formulation of the Detect Population Parameter and Evaluation of Detect Estimator Bias

Peer reviewed

Direct link

Roussos, Louis A.; Ozbek, Ozlem Yesim – Journal of Educational Measurement, 2006

The development of the DETECT procedure marked an important advancement in nonparametric dimensionality analysis. DETECT is the first nonparametric technique to estimate the number of dimensions in a data set, estimate an effect size for multidimensionality, and identify which dimension is predominantly measured by each item. The efficacy of…

Descriptors: Evaluation Methods, Effect Size, Test Bias, Item Response Theory

An Evaluation Model for Mastery Testing

Peer reviewed

Emrick, John A. – Journal of Educational Measurement, 1971

Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation Methods, Item Analysis

A Longitudinal Hierarchical Linear Model for Estimating School Effects and Their Stability.

Peer reviewed

Willms, J. Douglas; Raudenbush, Stephen W. – Journal of Educational Measurement, 1989

A general longitudinal model is presented for estimating school effects and their stability. The model, capable of separating true changes from sampling and measurement error, controls statistically for effects of factors exogenous to the school system. The model is illustrated with data from large cohorts of students in Scotland. (SLD)

Descriptors: Elementary Secondary Education, Equations (Mathematics), Error of Measurement, Estimation (Mathematics)

Least Squares Models to Correct for Rater Effects in Performance Assessment.

Peer reviewed

Raymond, Mark R.; Viswesvaran, Chockalingam – Journal of Educational Measurement, 1993

Three variations of a least squares regression model are presented that are suitable for determining and correcting for rating error in designs in which examinees are evaluated by a subset of possible raters. Models are applied to ratings from 4 administrations of a medical certification examination in which 40 raters and approximately 115…

Descriptors: Error of Measurement, Evaluation Methods, Higher Education, Interrater Reliability

Ankenmann, Robert D.	1
Barrett, Michelle D.	1
Carl Westine	1
Dwyer, Andrew C.	1
Emrick, John A.	1
Haag, Nicole	1
Kamata, Akihito	1
Kim, Stella Y.	1
Lee, Won-Chan	1
Michelle Boyer	1
Monahan, Patrick O.	1
Ozbek, Ozlem Yesim	1
Raudenbush, Stephen W.	1
Raymond, Mark R.	1
Roppelt, Alexander	1
Roussos, Louis A.	1
Sachse, Karoline A.	1
Stella Y. Kim	1
Tate, Richard	1
Tong Wu	1
Viswesvaran, Chockalingam	1
Willms, J. Douglas	1
van der Linden, Wim J.	1
More ▼