NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020
The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…
Descriptors: Classification, Accuracy, Scores, Cutting Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Barrett, Michelle D.; van der Linden, Wim J. – Journal of Educational Measurement, 2017
Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…
Descriptors: Item Response Theory, Error of Measurement, Programming, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Dwyer, Andrew C. – Journal of Educational Measurement, 2016
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…
Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards
Peer reviewed Peer reviewed
Direct linkDirect link
Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016
Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…
Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Kamata, Akihito; Tate, Richard – Journal of Educational Measurement, 2005
The goal of this study was the development of a procedure to predict the equating error associated with the long-term equating method of Tate (2003) for mixed-format tests. An expression for the determination of the error of an equating based on multiple links using the error for the component links was derived and illustrated with simulated data.…
Descriptors: Computer Simulation, Item Response Theory, Test Format, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Monahan, Patrick O.; Ankenmann, Robert D. – Journal of Educational Measurement, 2005
Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no…
Descriptors: Sample Size, Item Response Theory, Test Items, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Roussos, Louis A.; Ozbek, Ozlem Yesim – Journal of Educational Measurement, 2006
The development of the DETECT procedure marked an important advancement in nonparametric dimensionality analysis. DETECT is the first nonparametric technique to estimate the number of dimensions in a data set, estimate an effect size for multidimensionality, and identify which dimension is predominantly measured by each item. The efficacy of…
Descriptors: Evaluation Methods, Effect Size, Test Bias, Item Response Theory
Peer reviewed Peer reviewed
Emrick, John A. – Journal of Educational Measurement, 1971
Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation Methods, Item Analysis
Peer reviewed Peer reviewed
Willms, J. Douglas; Raudenbush, Stephen W. – Journal of Educational Measurement, 1989
A general longitudinal model is presented for estimating school effects and their stability. The model, capable of separating true changes from sampling and measurement error, controls statistically for effects of factors exogenous to the school system. The model is illustrated with data from large cohorts of students in Scotland. (SLD)
Descriptors: Elementary Secondary Education, Equations (Mathematics), Error of Measurement, Estimation (Mathematics)
Peer reviewed Peer reviewed
Raymond, Mark R.; Viswesvaran, Chockalingam – Journal of Educational Measurement, 1993
Three variations of a least squares regression model are presented that are suitable for determining and correcting for rating error in designs in which examinees are evaluated by a subset of possible raters. Models are applied to ratings from 4 administrations of a medical certification examination in which 40 raters and approximately 115…
Descriptors: Error of Measurement, Evaluation Methods, Higher Education, Interrater Reliability