ERIC - Search Results

Publication Date

In 2025	1
Since 2024	5
Since 2021 (last 5 years)	10
Since 2016 (last 10 years)	31
Since 2006 (last 20 years)	54

Descriptor

Error of Measurement	115
Item Response Theory	31
Scores	31
Test Items	22
Test Reliability	21
Statistical Analysis	19
Estimation (Mathematics)	18
Simulation	18
Mathematical Models	17
Models	17
True Scores	17
Reliability	16
Sample Size	16
Comparative Analysis	15
Equated Scores	13
Accuracy	12
Correlation	11
Evaluation Methods	11
Item Analysis	11
Statistical Bias	11
Test Construction	11
Psychometrics	10
Test Validity	10
Computation	9
Test Interpretation	9
More ▼

Source

Journal of Educational…

115

Publication Type

Journal Articles	101
Reports - Research	60
Reports - Evaluative	35
Reports - Descriptive	5
Speeches/Meeting Papers	3
Book/Product Reviews	1
Guides - Non-Classroom	1
Numerical/Quantitative Data	1

Education Level

Secondary Education	6
Elementary Secondary Education	3
Elementary Education	2
High Schools	2
Higher Education	2
Postsecondary Education	2
Grade 10	1
Grade 4	1
Grade 7	1
Grade 9	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Researchers

Location

South Carolina	1
United Kingdom (England)	1
United Kingdom (Scotland)	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Program for International…	3
ACT Assessment	1
Comprehensive Tests of Basic…	1
Graduate Record Examinations	1
Iowa Tests of Basic Skills	1
National Assessment of…	1
National Longitudinal Study…	1
Progress in International…	1
SAT (College Admission Test)	1
Test of English as a Foreign…	1
Torrance Tests of Creative…	1
Trends in International…	1
United States Medical…	1
Work Keys (ACT)	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 115 results Save | Export

A Note on Latent Traits Estimates under IRT Models with Missingness

Peer reviewed

Direct link

Guo, Jinxin; Xu, Xin; Xin, Tao – Journal of Educational Measurement, 2023

Missingness due to not-reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based…

Descriptors: Psychometrics, Bias, Error of Measurement, Test Validity

Modeling the Intraindividual Relation of Ability and Speed within a Test

Peer reviewed

Direct link

Augustin Mutak; Robert Krause; Esther Ulitzsch; Sören Much; Jochen Ranger; Steffi Pohl – Journal of Educational Measurement, 2024

Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating…

Descriptors: Testing, Academic Ability, Time on Task, Correlation

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework

Peer reviewed

Direct link

Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024

This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…

Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction

Using Item Scores and Distractors in Person-Fit Assessment

Peer reviewed

Direct link

Gorney, Kylie; Wollack, James A. – Journal of Educational Measurement, 2023

In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the l[subscript z] and l*[subscript z] person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through…

Descriptors: Test Items, Scores, Goodness of Fit, Statistics

Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches

Peer reviewed

Direct link

Güler Yavuz Temel – Journal of Educational Measurement, 2024

The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in…

Descriptors: Computation, Multidimensional Scaling, Item Response Theory, Models

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Likelihood-Based Estimation of Model-Derived Oral Reading Fluency

Peer reviewed

Direct link

Cornelis Potgieter; Xin Qiao; Akihito Kamata; Yusuf Kara – Journal of Educational Measurement, 2024

As part of the effort to develop an improved oral reading fluency (ORF) assessment system, Kara et al. estimated the ORF scores based on a latent variable psychometric model of accuracy and speed for ORF data via a fully Bayesian approach. This study further investigates likelihood-based estimators for the model-derived ORF scores, including…

Descriptors: Oral Reading, Reading Fluency, Scores, Psychometrics

Standard Errors of Variance Components, Measurement Errors and Generalizability Coefficients for Crossed Designs

Peer reviewed

Direct link

Almehrizi, Rashid S. – Journal of Educational Measurement, 2021

Estimates of various variance components, universe score variance, measurement error variances, and generalizability coefficients, like all statistics, are subject to sampling variability, particularly in small samples. Such variability is quantified traditionally through estimated standard errors and/or confidence intervals. The paper derived new…

Descriptors: Error of Measurement, Statistics, Design, Generalizability Theory

Logistic Regression Procedure Using Penalized Maximum Likelihood Estimation for Differential Item Functioning

Peer reviewed

Direct link

Lee, Sunbok – Journal of Educational Measurement, 2020

In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML…

Descriptors: Regression (Statistics), Test Bias, Maximum Likelihood Statistics, Simulation

A New Statistic for Selecting the Smoothing Parameter for Polynomial Loglinear Equating under the Random Groups Design

Peer reviewed

Direct link

Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2020

Smoothing is designed to yield smoother equating results that can reduce random equating error without introducing very much systematic error. The main objective of this study is to propose a new statistic and to compare its performance to the performance of the Akaike information criterion and likelihood ratio chi-square difference statistics in…

Descriptors: Equated Scores, Statistical Analysis, Error of Measurement, Criteria

A New Statistic to Assess Fitness of Cubic-Spline Postsmoothing

Peer reviewed

Direct link

Kim, Hyung Jin; Brennan, Robert L.; Lee, Won-Chan – Journal of Educational Measurement, 2020

In equating, smoothing techniques are frequently used to diminish sampling error. There are typically two types of smoothing: presmoothing and postsmoothing. For polynomial log-linear presmoothing, an optimum smoothing degree can be determined statistically based on the Akaike information criterion or Chi-square difference criterion. For…

Descriptors: Equated Scores, Sampling, Error of Measurement, Statistical Analysis

Examining Differential Rater Functioning Using a Between-Subgroup Outfit Approach

Peer reviewed

Direct link

Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019

When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…

Descriptors: Measurement, Models, Evaluators, Simulation

Two IRT Characteristic Curve Linking Methods Weighted by Information

Peer reviewed

Direct link

Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…

Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods

Classification Consistency and Accuracy with Atypical Score Distributions

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020

The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…

Descriptors: Classification, Accuracy, Scores, Cutting Scores

Performance of Person-Fit Statistics under Model Misspecification

Peer reviewed

Direct link

Hong, Seong Eun; Monroe, Scott; Falk, Carl F. – Journal of Educational Measurement, 2020

In educational and psychological measurement, a person-fit statistic (PFS) is designed to identify aberrant response patterns. For parametric PFSs, valid inference depends on several assumptions, one of which is that the item response theory (IRT) model is correctly specified. Previous studies have used empirical data sets to explore the effects…

Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Error of Measurement

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Kolen, Michael J.	7
Lee, Won-Chan	7
Livingston, Samuel A.	4
Andersson, Björn	3
Lee, Guemin	3
Moses, Tim	3
Brennan, Robert L.	2
Feldt, Leonard S.	2
Hanson, Bradley A.	2
Harris, Chester W.	2
Harris, Deborah J.	2
Kamata, Akihito	2
Kane, Michael	2
Kim, Stella Y.	2
Liu, Chunyan	2
Puhan, Gautam	2
Raymond, Mark R.	2
Roussos, Louis A.	2
Rowley, Glenn L.	2
Rutkowski, Leslie	2
Shang, Yi	2
Shavelson, Richard J.	2
Subkoviak, Michael J.	2
Whitely, Susan E.	2
More ▼