Publication Date
In 2025 | 1 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 15 |
Since 2006 (last 20 years) | 29 |
Descriptor
Models | 34 |
Evaluation Methods | 21 |
Item Response Theory | 17 |
Monte Carlo Methods | 11 |
Test Items | 10 |
Simulation | 9 |
Markov Processes | 7 |
Bayesian Statistics | 6 |
Cognitive Measurement | 6 |
Test Bias | 6 |
Achievement Tests | 5 |
More ▼ |
Source
Journal of Educational… | 34 |
Author
Jiao, Hong | 3 |
Nandakumar, Ratna | 2 |
Qiao, Xin | 2 |
Wang, Wen-Chung | 2 |
Wilson, Mark | 2 |
Wind, Stefanie A. | 2 |
de la Torre, Jimmy | 2 |
Albano, Anthony D. | 1 |
Ankenmann, Robert D. | 1 |
Anselmi, Pasquale | 1 |
Armstrong, Ronald D. | 1 |
More ▼ |
Publication Type
Journal Articles | 31 |
Reports - Research | 22 |
Reports - Evaluative | 6 |
Reports - Descriptive | 3 |
Education Level
Secondary Education | 4 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 4 |
Graduate Record Examinations | 1 |
What Works Clearinghouse Rating
Joo, Seang-Hwane; Lee, Philseok – Journal of Educational Measurement, 2022
Abstract This study proposes a new Bayesian differential item functioning (DIF) detection method using posterior predictive model checking (PPMC). Item fit measures including infit, outfit, observed score distribution (OSD), and Q1 were considered as discrepancy statistics for the PPMC DIF methods. The performance of the PPMC DIF method was…
Descriptors: Test Items, Bayesian Statistics, Monte Carlo Methods, Prediction
Jihong Zhang; Jonathan Templin; Xinya Liang – Journal of Educational Measurement, 2024
Recently, Bayesian diagnostic classification modeling has been becoming popular in health psychology, education, and sociology. Typically information criteria are used for model selection when researchers want to choose the best model among alternative models. In Bayesian estimation, posterior predictive checking is a flexible Bayesian model…
Descriptors: Bayesian Statistics, Cognitive Measurement, Models, Classification
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024
For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…
Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory
Kim, Kyung Yong – Journal of Educational Measurement, 2020
New items are often evaluated prior to their operational use to obtain item response theory (IRT) item parameter estimates for quality control purposes. Fixed parameter calibration is one linking method that is widely used to estimate parameters for new items and place them on the desired scale. This article provides detailed descriptions of two…
Descriptors: Item Response Theory, Evaluation Methods, Test Items, Simulation
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Qiao, Xin; Jiao, Hong; He, Qiwei – Journal of Educational Measurement, 2023
Multiple group modeling is one of the methods to address the measurement noninvariance issue. Traditional studies on multiple group modeling have mainly focused on item responses. In computer-based assessments, joint modeling of response times and action counts with item responses helps estimate the latent speed and action levels in addition to…
Descriptors: Multivariate Analysis, Models, Item Response Theory, Statistical Distributions
Qiao, Xin; Jiao, Hong – Journal of Educational Measurement, 2021
This study proposes explanatory cognitive diagnostic model (CDM) jointly incorporating responses and response times (RTs) with the inclusion of item covariates related to both item responses and RTs. The joint modeling of item responses and RTs intends to provide more information for cognitive diagnosis while item covariates can be used to predict…
Descriptors: Cognitive Measurement, Models, Reaction Time, Test Items
Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019
Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…
Descriptors: Rating Scales, Models, Evaluators, Data Collection
Feuerstahler, Leah; Wilson, Mark – Journal of Educational Measurement, 2019
Scores estimated from multidimensional item response theory (IRT) models are not necessarily comparable across dimensions. In this article, the concept of aligned dimensions is formalized in the context of Rasch models, and two methods are described--delta dimensional alignment (DDA) and logistic regression alignment (LRA)--to transform estimated…
Descriptors: Item Response Theory, Models, Scores, Comparative Analysis
Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example
Li, Xiaomin; Wang, Wen-Chung – Journal of Educational Measurement, 2015
The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…
Descriptors: Test Bias, Models, Cognitive Measurement, Evaluation Methods
Madison, Matthew J.; Bradshaw, Laine – Journal of Educational Measurement, 2018
The evaluation of intervention effects is an important objective of educational research. One way to evaluate the effectiveness of an intervention is to conduct an experiment that assigns individuals to control and treatment groups. In the context of pretest/posttest designed studies, this is referred to as a control-group pretest/posttest design.…
Descriptors: Intervention, Program Evaluation, Program Effectiveness, Control Groups
Fox, Jean-Paul; Marianti, Sukaesi – Journal of Educational Measurement, 2017
Response accuracy and response time data can be analyzed with a joint model to measure ability and speed of working, while accounting for relationships between item and person characteristics. In this study, person-fit statistics are proposed for joint models to detect aberrant response accuracy and/or response time patterns. The person-fit tests…
Descriptors: Accuracy, Reaction Time, Statistics, Test Items
Chalmers, R. Philip – Journal of Educational Measurement, 2015
A mixed-effects item response theory (IRT) model is presented as a logical extension of the generalized linear mixed-effects modeling approach to formulating explanatory IRT models. Fixed and random coefficients in the extended model are estimated using a Metropolis-Hastings Robbins-Monro (MH-RM) stochastic imputation algorithm to accommodate for…
Descriptors: Item Response Theory, Models, Mathematics, Regression (Statistics)
Lee, Soo; Suh, Youngsuk – Journal of Educational Measurement, 2018
Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…
Descriptors: Item Response Theory, Sample Size, Models, Error of Measurement