ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	10

Descriptor

Robustness (Statistics)	13
Models	6
Item Response Theory	5
Test Items	5
Simulation	4
Response Style (Tests)	3
Statistical Analysis	3
Computation	2
Computer Simulation	2
Identification	2
Reaction Time	2
Regression (Statistics)	2
Responses	2
Sample Size	2
Test Reliability	2
Test Validity	2
Testing Problems	2
Tests	2
Ability	1
Achievement Tests	1
Adaptive Testing	1
Bayesian Statistics	1
Bias	1
Black Students	1
Cognitive Processes	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	13
Reports - Research	6
Reports - Evaluative	4
Reports - Descriptive	3

Education Level

Secondary Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 13 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

A Residual-Based Differential Item Functioning Detection Framework in Item Response Theory

Peer reviewed

Direct link

Lim, Hwanggyu; Choe, Edison M.; Han, Kyung T. – Journal of Educational Measurement, 2022

Differential item functioning (DIF) of test items should be evaluated using practical methods that can produce accurate and useful results. Among a plethora of DIF detection techniques, we introduce the new "Residual DIF" (RDIF) framework, which stands out for its accessibility without sacrificing efficacy. This framework consists of…

Descriptors: Test Items, Item Response Theory, Identification, Robustness (Statistics)

Robust Estimation for Response Time Modeling

Peer reviewed

Direct link

Hong, Maxwell; Rebouças, Daniella A.; Cheng, Ying – Journal of Educational Measurement, 2021

Response time has started to play an increasingly important role in educational and psychological testing, which prompts many response time models to be proposed in recent years. However, response time modeling can be adversely impacted by aberrant response behavior. For example, test speededness can cause response time to certain items to deviate…

Descriptors: Reaction Time, Models, Computation, Robustness (Statistics)

Bayesian Extension of Biweight and Huber Weight for Robust Ability Estimation

Peer reviewed

Direct link

Maeda, Hotaka; Zhang, Bo – Journal of Educational Measurement, 2020

When a response pattern does not fit a selected measurement model, one may resort to robust ability estimation. Two popular robust methods are biweight and Huber weight. So far, research on these methods has been quite limited. This article proposes the maximum a posteriori biweight (BMAP) and Huber weight (HMAP) estimation methods. These methods…

Descriptors: Bayesian Statistics, Robustness (Statistics), Computation, Monte Carlo Methods

Robust Estimation of Ability and Mental Speed Employing the Hierarchical Model for Responses and Response Times

Peer reviewed

Direct link

Ranger, Jochen; Kuhn, Jörg-Tobias; Wolgast, Anett – Journal of Educational Measurement, 2021

Van der Linden's hierarchical model for responses and response times can be used in order to infer the ability and mental speed of test takers from their responses and response times in an educational test. A standard approach for this is maximum likelihood estimation. In real-world applications, the data of some test takers might be partly…

Descriptors: Models, Reaction Time, Item Response Theory, Tests

Filtering Data for Detecting Differential Development

Peer reviewed

Direct link

Brinkhuis, Matthieu J. S.; Bakker, Marjan; Maris, Gunter – Journal of Educational Measurement, 2015

The amount of data available in the context of educational measurement has vastly increased in recent years. Such data are often incomplete, involve tests administered at different time points and during the course of many years, and can therefore be quite challenging to model. In addition, intermediate results like grades or report cards being…

Descriptors: Educational Assessment, Measures (Individuals), Data, Robustness (Statistics)

Robust Detection of Examinees with Aberrant Answer Changes

Peer reviewed

Direct link

Belov, Dmitry I. – Journal of Educational Measurement, 2015

The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong-to-right ACs) used to detect examinees…

Descriptors: Statistical Analysis, Robustness (Statistics), Identification, Test Items

Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

Peer reviewed

Direct link

Li, Zhushan – Journal of Educational Measurement, 2014

Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

Descriptors: Test Bias, Sample Size, Statistical Analysis, Regression (Statistics)

Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design

Peer reviewed

Direct link

Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016

Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach

Modeling Skipped and Not-Reached Items Using IRTrees

Peer reviewed

Direct link

Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017

When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…

Descriptors: Item Response Theory, Test Items, Responses, Testing Problems

Robustness of the School-Level IRT Model.

Peer reviewed

Tate, Richard L. – Journal of Educational Measurement, 1995

Robustness of the school-level item response theoretic (IRT) model to violations of distributional assumptions was studied in a computer simulation. In situations where school-level precision might be acceptable for real school comparisons, expected a posteriori estimates of school ability were robust over a range of violations and conditions.…

Descriptors: Comparative Analysis, Computer Simulation, Estimation (Mathematics), Item Response Theory

To Guess or Not to Guess: A Decision-Theoretic View of Formula Scoring.

Peer reviewed

Budescu, David; Bar-Hillel, Maya – Journal of Educational Measurement, 1993

Test taking and scoring are examined from the normative and descriptive perspectives of judgment and decision theory. The number-right scoring rule is endorsed because it discourages omissions and is robust against variability in respondent motivations, item vagaries, and limitations in judgments of uncertainty. (SLD)

Descriptors: Elementary Secondary Education, Guessing (Tests), Knowledge Level, Multiple Choice Tests

The Performance of the Mantel-Haenszel Procedure across Samples and Matching Criteria.

Peer reviewed

Ryan, Katherine E. – Journal of Educational Measurement, 1991

The reliability of Mantel-Haenszel (MH) indexes across samples of examinees and sample sizes and their robustness to item context effects were investigated with data for 670 African-American and 5,015 white students from the Second International Mathematics Study. MH procedures can be used to detect differential item functioning. (SLD)

Descriptors: Black Students, Comparative Testing, Context Effect, Evaluation Criteria

Bakker, Marjan	1
Bar-Hillel, Maya	1
Belov, Dmitry I.	1
Brinkhuis, Matthieu J. S.	1
Budescu, David	1
Carl Westine	1
Chang, Hua-Hua	1
Cheng, Ying	1
Choe, Edison M.	1
De Boeck, Paul	1
Debeer, Dries	1
Douglas, Jeff	1
Han, Kyung T.	1
Hong, Maxwell	1
Janssen, Rianne	1
Kuhn, Jörg-Tobias	1
Li, Zhushan	1
Lim, Hwanggyu	1
Lin, Haiyan	1
Maeda, Hotaka	1
Maris, Gunter	1
Michelle Boyer	1
Ranger, Jochen	1
Rebouças, Daniella A.	1
Ryan, Katherine E.	1
More ▼