ERIC - Search Results

Publication Date

In 2025	7
Since 2024	9

Source

Journal of Educational…

Author

Amery D. Wu	1
Augustin Mutak	1
Christine DeMars	1
Danqi Zhu	1
Edison M. Choe	1
Esther Ulitzsch	1
Hamid Mohammadi	1
Hwanggyu Lim	1
Jake Stone	1
Jinnie Shin	1
Jochen Ranger	1
Kelsey Nason	1
Kuan-Yu Jin	1
Kylie Gorney	1
Kyung T. Han	1
Mark J. Gierl	1
Mingfeng Xue	1
Ping Chen	1
Robert Krause	1
Sandip Sinharay	1
Shun-Fu Hu	1
Steffi Pohl	1
Sören Much	1
Tahereh Firoozi	1
Wai-Lok Siu	1
More ▼

Publication Type

Journal Articles	9
Reports - Research	9

Education Level

Higher Education	2
Postsecondary Education	2
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Another Look at Yen's Q3: Is 0.2 an Appropriate Cut-Off?

Peer reviewed

Direct link

Kelsey Nason; Christine DeMars – Journal of Educational Measurement, 2025

This study examined the widely used threshold of 0.2 for Yen's Q3, an index for violations of local independence. Specifically, a simulation was conducted to investigate whether Q3 values were related to the magnitude of bias in estimates of reliability, item parameters, and examinee ability. Results showed that Q3 values below the typical cut-off…

Descriptors: Item Response Theory, Statistical Bias, Test Reliability, Test Items

Modeling the Intraindividual Relation of Ability and Speed within a Test

Peer reviewed

Direct link

Augustin Mutak; Robert Krause; Esther Ulitzsch; Sören Much; Jochen Ranger; Steffi Pohl – Journal of Educational Measurement, 2024

Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating…

Descriptors: Testing, Academic Ability, Time on Task, Correlation

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework

Peer reviewed

Direct link

Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024

This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…

Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction

Modeling Directional Testlet Effects on Multiple Open-Ended Questions

Peer reviewed

Direct link

Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025

Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…

Descriptors: Models, Test Items, Educational Assessment, Scores

Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles

Peer reviewed

Direct link

Mingfeng Xue; Ping Chen – Journal of Educational Measurement, 2025

Response styles pose great threats to psychological measurements. This research compares IRTree models and anchoring vignettes in addressing response styles and estimating the target traits. It also explores the potential of combining them at the item level and total-score level (ratios of extreme and middle responses to vignettes). Four models…

Descriptors: Item Response Theory, Models, Comparative Analysis, Vignettes

Evaluating the Consistency and Reliability of Attribution Methods in Automated Short Answer Grading (ASAG) Systems: Toward an Explainable Scoring System

Peer reviewed

Direct link

Wallace N. Pinto Jr.; Jinnie Shin – Journal of Educational Measurement, 2025

In recent years, the application of explainability techniques to automated essay scoring and automated short-answer grading (ASAG) models, particularly those based on transformer architectures, has gained significant attention. However, the reliability and consistency of these techniques remain underexplored. This study systematically investigates…

Descriptors: Automation, Grading, Computer Assisted Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Test Reliability	9
Evaluation Methods	4
Item Response Theory	4
Scores	4
Accuracy	3
Test Items	3
Test Validity	3
Computer Assisted Testing	2
Error of Measurement	2
Models	2
Student Evaluation	2
Test Construction	2
Testing	2
Tests	2
Ability	1
Academic Ability	1
Alternative Assessment	1
Assessment Literacy	1
Attribution Theory	1
Automation	1
College Students	1
Comparative Analysis	1
Comparative Testing	1
Computer Software	1
Correlation	1
More ▼