ERIC - Search Results

Publication Date

In 2025	3
Since 2024	4
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	29

Descriptor

Evaluation Methods	40
Item Response Theory	32
Test Items	16
Simulation	14
Comparative Analysis	9
Models	8
Test Bias	8
Computation	6
Error of Measurement	6
Scores	6
Evaluation Research	5
Measurement	5
Accuracy	4
College Entrance Examinations	4
Computer Assisted Testing	4
Computer Simulation	4
Data Analysis	4
Educational Assessment	4
Equated Scores	4
Item Analysis	4
Monte Carlo Methods	4
Statistical Analysis	4
Test Theory	4
Test Validity	4
Achievement Tests	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	40
Reports - Research	20
Reports - Evaluative	18
Reports - Descriptive	2
Opinion Papers	1

Education Level

Secondary Education	2
Elementary Secondary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Program for International…	2
SAT (College Admission Test)	2
Graduate Record Examinations	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 40 results Save | Export

Fully Gibbs Sampling Algorithms for Bayesian Variable Selection in Latent Regression Models

Peer reviewed

Direct link

Yamaguchi, Kazuhiro; Zhang, Jihong – Journal of Educational Measurement, 2023

This study proposed Gibbs sampling algorithms for variable selection in a latent regression model under a unidimensional two-parameter logistic item response theory model. Three types of shrinkage priors were employed to obtain shrinkage estimates: double-exponential (i.e., Laplace), horseshoe, and horseshoe+ priors. These shrinkage priors were…

Descriptors: Algorithms, Simulation, Mathematics Achievement, Bayesian Statistics

Using Multiple Maximum Exposure Rates in Computerized Adaptive Testing

Peer reviewed

Direct link

Kylie Gorney; Mark D. Reckase – Journal of Educational Measurement, 2025

In computerized adaptive testing, item exposure control methods are often used to provide a more balanced usage of the item pool. Many of the most popular methods, including the restricted method (Revuelta and Ponsoda), use a single maximum exposure rate to limit the proportion of times that each item is administered. However, Barrada et al.…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Item Banks

Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model

Peer reviewed

Direct link

Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023

Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…

Descriptors: Testing, Computation, Classification, Accuracy

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

A New Bayesian Person-Fit Analysis Method Using Pivotal Discrepancy Measures

Peer reviewed

Direct link

Combs, Adam – Journal of Educational Measurement, 2023

A common method of checking person-fit in Bayesian item response theory (IRT) is the posterior-predictive (PP) method. In recent years, more powerful approaches have been proposed that are based on resampling methods using the popular L*[subscript z] statistic. There has also been proposed a new Bayesian model checking method based on pivotal…

Descriptors: Bayesian Statistics, Goodness of Fit, Evaluation Methods, Monte Carlo Methods

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

Peer reviewed

Direct link

Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…

Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory

Statistical Theory and Assessment Practice

Peer reviewed

Direct link

Haberman, Shelby J. – Journal of Educational Measurement, 2020

Examples of the impact of statistical theory on assessment practice are provided from the perspective of a statistician trained in theoretical statistics who began to work on assessments. Goodness of fit of item-response models is examined in terms of restricted likelihood-ratio tests and generalized residuals. Minimum discriminant information…

Descriptors: Statistics, Goodness of Fit, Item Response Theory, Statistical Analysis

Two IRT Fixed Parameter Calibration Methods for the Bifactor Model

Peer reviewed

Direct link

Kim, Kyung Yong – Journal of Educational Measurement, 2020

New items are often evaluated prior to their operational use to obtain item response theory (IRT) item parameter estimates for quality control purposes. Fixed parameter calibration is one linking method that is widely used to estimate parameters for new items and place them on the desired scale. This article provides detailed descriptions of two…

Descriptors: Item Response Theory, Evaluation Methods, Test Items, Simulation

Estimating the Accuracy of Relative Growth Measures Using Empirical Data

Peer reviewed

Direct link

Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020

The residual gain score has been of historical interest, and its percentile rank has been of interest more recently given its close correspondence to the popular Student Growth Percentile. However, these estimators suffer from low accuracy and systematic bias (bias conditional on prior latent achievement). This article explores three…

Descriptors: Accuracy, Student Evaluation, Measurement Techniques, Evaluation Methods

Scale Alignment in Between-Item Multidimensional Rasch Models

Peer reviewed

Direct link

Feuerstahler, Leah; Wilson, Mark – Journal of Educational Measurement, 2019

Scores estimated from multidimensional item response theory (IRT) models are not necessarily comparable across dimensions. In this article, the concept of aligned dimensions is formalized in the context of Rasch models, and two methods are described--delta dimensional alignment (DDA) and logistic regression alignment (LRA)--to transform estimated…

Descriptors: Item Response Theory, Models, Scores, Comparative Analysis

Optimal Linking Design for Response Model Parameters

Peer reviewed

Direct link

Barrett, Michelle D.; van der Linden, Wim J. – Journal of Educational Measurement, 2017

Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…

Descriptors: Item Response Theory, Error of Measurement, Programming, Evaluation Methods

A Stepwise Test Characteristic Curve Method to Detect Item Parameter Drift

Peer reviewed

Direct link

Guo, Rui; Zheng, Yi; Chang, Hua-Hua – Journal of Educational Measurement, 2015

An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…

Descriptors: Item Response Theory, Test Items, Evaluation Methods, Equated Scores

Local Observed-Score Kernel Equating

Peer reviewed

Direct link

Wiberg, Marie; van der Linden, Wim J.; von Davier, Alina A. – Journal of Educational Measurement, 2014

Three local observed-score kernel equating methods that integrate methods from the local equating and kernel equating frameworks are proposed. The new methods were compared with their earlier counterparts with respect to such measures as bias--as defined by Lord's criterion of equity--and percent relative error. The local kernel item response…

Descriptors: Measurement Techniques, Evaluation Methods, Item Response Theory, Equated Scores

Previous Page | Next Page »

Pages: 1 | 2 | 3

Wilson, Mark	3
van der Linden, Wim J.	3
Kim, Kyung Yong	2
Albano, Anthony D.	1
Allen, Nancy L.	1
Ankenmann, Robert D.	1
Artur Pokropek	1
Barrett, Michelle D.	1
Beaton, Albert E.	1
Bejar, Isaac I.	1
Breithaupt, Krista	1
Briggs, Derek C.	1
Carl Westine	1
Carmen Köhler	1
Castellano, Katherine E.	1
Chang, Hua-Hua	1
Chen, Shu-Ying	1
Cheng, Ying	1
Chuah, Siang Chee	1
Combs, Adam	1
Cui, Ying	1
Dorans, Neil J.	1
Feuerstahler, Leah	1
Finch, Holmes	1
More ▼