Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 10 |
Descriptor
Robustness (Statistics) | 13 |
Models | 6 |
Item Response Theory | 5 |
Test Items | 5 |
Simulation | 4 |
Response Style (Tests) | 3 |
Statistical Analysis | 3 |
Computation | 2 |
Computer Simulation | 2 |
Identification | 2 |
Reaction Time | 2 |
More ▼ |
Source
Journal of Educational… | 13 |
Author
Bakker, Marjan | 1 |
Bar-Hillel, Maya | 1 |
Belov, Dmitry I. | 1 |
Brinkhuis, Matthieu J. S. | 1 |
Budescu, David | 1 |
Carl Westine | 1 |
Chang, Hua-Hua | 1 |
Cheng, Ying | 1 |
Choe, Edison M. | 1 |
De Boeck, Paul | 1 |
Debeer, Dries | 1 |
More ▼ |
Publication Type
Journal Articles | 13 |
Reports - Research | 6 |
Reports - Evaluative | 4 |
Reports - Descriptive | 3 |
Education Level
Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
What Works Clearinghouse Rating
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Lim, Hwanggyu; Choe, Edison M.; Han, Kyung T. – Journal of Educational Measurement, 2022
Differential item functioning (DIF) of test items should be evaluated using practical methods that can produce accurate and useful results. Among a plethora of DIF detection techniques, we introduce the new "Residual DIF" (RDIF) framework, which stands out for its accessibility without sacrificing efficacy. This framework consists of…
Descriptors: Test Items, Item Response Theory, Identification, Robustness (Statistics)
Hong, Maxwell; Rebouças, Daniella A.; Cheng, Ying – Journal of Educational Measurement, 2021
Response time has started to play an increasingly important role in educational and psychological testing, which prompts many response time models to be proposed in recent years. However, response time modeling can be adversely impacted by aberrant response behavior. For example, test speededness can cause response time to certain items to deviate…
Descriptors: Reaction Time, Models, Computation, Robustness (Statistics)
Maeda, Hotaka; Zhang, Bo – Journal of Educational Measurement, 2020
When a response pattern does not fit a selected measurement model, one may resort to robust ability estimation. Two popular robust methods are biweight and Huber weight. So far, research on these methods has been quite limited. This article proposes the maximum a posteriori biweight (BMAP) and Huber weight (HMAP) estimation methods. These methods…
Descriptors: Bayesian Statistics, Robustness (Statistics), Computation, Monte Carlo Methods
Ranger, Jochen; Kuhn, Jörg-Tobias; Wolgast, Anett – Journal of Educational Measurement, 2021
Van der Linden's hierarchical model for responses and response times can be used in order to infer the ability and mental speed of test takers from their responses and response times in an educational test. A standard approach for this is maximum likelihood estimation. In real-world applications, the data of some test takers might be partly…
Descriptors: Models, Reaction Time, Item Response Theory, Tests
Brinkhuis, Matthieu J. S.; Bakker, Marjan; Maris, Gunter – Journal of Educational Measurement, 2015
The amount of data available in the context of educational measurement has vastly increased in recent years. Such data are often incomplete, involve tests administered at different time points and during the course of many years, and can therefore be quite challenging to model. In addition, intermediate results like grades or report cards being…
Descriptors: Educational Assessment, Measures (Individuals), Data, Robustness (Statistics)
Belov, Dmitry I. – Journal of Educational Measurement, 2015
The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong-to-right ACs) used to detect examinees…
Descriptors: Statistical Analysis, Robustness (Statistics), Identification, Test Items
Li, Zhushan – Journal of Educational Measurement, 2014
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Descriptors: Test Bias, Sample Size, Statistical Analysis, Regression (Statistics)
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach
Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017
When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…
Descriptors: Item Response Theory, Test Items, Responses, Testing Problems

Tate, Richard L. – Journal of Educational Measurement, 1995
Robustness of the school-level item response theoretic (IRT) model to violations of distributional assumptions was studied in a computer simulation. In situations where school-level precision might be acceptable for real school comparisons, expected a posteriori estimates of school ability were robust over a range of violations and conditions.…
Descriptors: Comparative Analysis, Computer Simulation, Estimation (Mathematics), Item Response Theory

Budescu, David; Bar-Hillel, Maya – Journal of Educational Measurement, 1993
Test taking and scoring are examined from the normative and descriptive perspectives of judgment and decision theory. The number-right scoring rule is endorsed because it discourages omissions and is robust against variability in respondent motivations, item vagaries, and limitations in judgments of uncertainty. (SLD)
Descriptors: Elementary Secondary Education, Guessing (Tests), Knowledge Level, Multiple Choice Tests

Ryan, Katherine E. – Journal of Educational Measurement, 1991
The reliability of Mantel-Haenszel (MH) indexes across samples of examinees and sample sizes and their robustness to item context effects were investigated with data for 670 African-American and 5,015 white students from the Second International Mathematics Study. MH procedures can be used to detect differential item functioning. (SLD)
Descriptors: Black Students, Comparative Testing, Context Effect, Evaluation Criteria