ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	2

Descriptor

Computer Assisted Testing	2
Evaluators	2
Scoring	2
Accuracy	1
Alternative Assessment	1
Artificial Intelligence	1
Automation	1
Classification	1
Computational Linguistics	1
Computer Games	1
Computer Software	1
Design	1
Error Patterns	1
Item Response Theory	1
Research Problems	1
Scores	1
Test Scoring Machines	1
More ▼

Source

Journal of Educational…

Author

Alex J. Mechaber	1
Brian E. Clauser	1
Casabianca, Jodi M.	1
Chao, Szu-Fu	1
Choi, Ikkyu	1
Donoghue, John R.	1
Kai North	1
Le An Ha	1
Peter Baldwin	1
Shin, Hyo Jeong	1
Victoria Yaneva	1
Yiyun Zhou	1
More ▼

Publication Type

Journal Articles	2
Reports - Research	2

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 2 results Save | Export

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Peer reviewed

Direct link

Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023

Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…

Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems