ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	2

Descriptor

Error Patterns	3
Evaluators	3
Evaluation Methods	2
Accuracy	1
Artificial Intelligence	1
Classification	1
Computation	1
Computational Linguistics	1
Computer Assisted Testing	1
Computer Games	1
Computer Software	1
Data Collection	1
Goodness of Fit	1
Higher Education	1
Medical Education	1
Medical Students	1
Models	1
Performance Based Assessment	1
Rating Scales	1
Research Design	1
Scoring	1
More ▼

Source

Journal of Educational…

Author

Alex J. Mechaber	1
Brian E. Clauser	1
Clauser, Brian E.	1
Clyman, Stephen G.	1
Jones, Eli	1
Kai North	1
Le An Ha	1
Peter Baldwin	1
Swanson, David B.	1
Victoria Yaneva	1
Wind, Stefanie A.	1
Yiyun Zhou	1
More ▼

Publication Type

Journal Articles	3
Reports - Research	3

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 3 results Save | Export

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

The Effects of Incomplete Rating Designs in Combination with Rater Effects

Peer reviewed

Direct link

Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019

Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…

Descriptors: Rating Scales, Models, Evaluators, Data Collection

Components of Rater Error in a Complex Performance Assessment.

Peer reviewed

Clauser, Brian E.; Clyman, Stephen G.; Swanson, David B. – Journal of Educational Measurement, 1999

Two studies focused on aspects of the rating process in performance assessment. The first, which involved 15 raters and about 400 medical students, made the "committee" facet of raters working in groups explicit, and the second, which involved about 200 medical students and four raters, made the "rating-occasion" facet…

Descriptors: Error Patterns, Evaluation Methods, Evaluators, Higher Education