ERIC - Search Results

Publication Date

In 2025	4
Since 2024	7

Source

Journal of Educational…

Publication Type

Journal Articles	7
Reports - Research	7

Education Level

Higher Education	2
Postsecondary Education	2
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Model Selection Posterior Predictive Model Checking via Limited-Information Indices for Bayesian Diagnostic Classification Modeling

Peer reviewed

Direct link

Jihong Zhang; Jonathan Templin; Xinya Liang – Journal of Educational Measurement, 2024

Recently, Bayesian diagnostic classification modeling has been becoming popular in health psychology, education, and sociology. Typically information criteria are used for model selection when researchers want to choose the best model among alternative models. In Bayesian estimation, posterior predictive checking is a flexible Bayesian model…

Descriptors: Bayesian Statistics, Cognitive Measurement, Models, Classification

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

A Nonparametric Composite Group DIF Index for Focal Groups Stemming from Multicategorical Variables

Peer reviewed

Direct link

Corinne Huggins-Manley; Anthony W. Raborn; Peggy K. Jones; Ted Myers – Journal of Educational Measurement, 2024

The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the…

Descriptors: Nonparametric Statistics, Test Bias, Scores, Statistical Significance

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

Peer reviewed

Direct link

Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…

Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Evaluation Methods	7
Test Validity	4
Models	3
Scores	3
Test Reliability	3
Accuracy	2
Hierarchical Linear Modeling	2
Item Response Theory	2
Test Bias	2
Tests	2
Achievement Tests	1
Alternative Assessment	1
Assessment Literacy	1
Bayesian Statistics	1
Bias	1
Classification	1
Cognitive Measurement	1
College Students	1
Comparative Testing	1
Computer Assisted Testing	1
Equated Scores	1
Error of Measurement	1
Evaluators	1
Foreign Countries	1
German	1
More ▼

Amery D. Wu	1
Anthony W. Raborn	1
Artur Pokropek	1
Carl Westine	1
Carmen Köhler	1
Corinne Huggins-Manley	1
Hamid Mohammadi	1
Jake Stone	1
Jihong Zhang	1
Johannes Hartig	1
Jonathan Templin	1
Kylie Gorney	1
Lale Khorramdel	1
Mark J. Gierl	1
Michelle Boyer	1
Peggy K. Jones	1
Sandip Sinharay	1
Shun-Fu Hu	1
Stella Y. Kim	1
Tahereh Firoozi	1
Ted Myers	1
Tong Wu	1
Xinya Liang	1
More ▼