ERIC - Search Results

Publication Date

In 2025	1
Since 2024	7

Descriptor

Error Patterns	7
Scoring	7
Accuracy	3
Artificial Intelligence	2
Comparative Analysis	2
Computational Linguistics	2
Computer Assisted Testing	2
Computer Software	2
Writing Evaluation	2
Adults	1
American Sign Language	1
Asians	1
At Risk Students	1
Children	1
Classification	1
Computer Games	1
Contrastive Linguistics	1
Cross Cultural Studies	1
Cues	1
Cutting Scores	1
Deafness	1
Doctoral Programs	1
English	1
Error of Measurement	1
Essays	1
More ▼

Source

ProQuest LLC	2
Advances in Health Sciences…	1
American Annals of the Deaf	1
Educational Assessment	1
Grantee Submission	1
Journal of Educational…	1

Author

Akihito Kamata	1
Alex J. Mechaber	1
Brian E. Clauser	1
Cornelis Potgieter	1
Jessica Stinson	1
Kai North	1
Kimberly Wolbers	1
Le An Ha	1
Mark White	1
Matt Homer	1
Matt Ronfeldt	1
Peter Baldwin	1
Rachel Saulsburry	1
Victoria Yaneva	1
Xin Qiao	1
Yachong Cui	1
Yi Gui	1
Yiyun Zhou	1
More ▼

Publication Type

Reports - Research	5
Journal Articles	4
Dissertations/Theses -…	2

Education Level

Higher Education	2
Postsecondary Education	2
Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Grade 10	1
Grade 11	1
Grade 9	1
High Schools	1
Junior High Schools	1
Kindergarten	1
Middle Schools	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Location

China	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Wechsler Adult Intelligence…	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Towards a More Nuanced Conceptualisation of Differential Examiner Stringency in OSCEs

Peer reviewed

Direct link

Matt Homer – Advances in Health Sciences Education, 2024

Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In…

Descriptors: Examiners, Scoring, Validity, Cutting Scores

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Incorporating Calibration Errors in Oral Reading Fluency Scoring

Peer reviewed

Direct link

Xin Qiao; Akihito Kamata; Cornelis Potgieter – Grantee Submission, 2024

Oral reading fluency (ORF) assessments are commonly used to screen at-risk readers and evaluate interventions' effectiveness as curriculum-based measurements. Similar to the standard practice in item response theory (IRT), calibrated passage parameter estimates are currently used as if they were population values in model-based ORF scoring.…

Descriptors: Oral Reading, Reading Fluency, Error Patterns, Scoring

Monitoring Rater Quality in Observational Systems: Issues Due to Unreliable Estimates of Rater Quality

Peer reviewed

Direct link

Mark White; Matt Ronfeldt – Educational Assessment, 2024

Standardized observation systems seek to reliably measure a specific conceptualization of teaching quality, managing rater error through mechanisms such as certification, calibration, validation, and double-scoring. These mechanisms both support high quality scoring and generate the empirical evidence used to support the scoring inference (i.e.,…

Descriptors: Interrater Reliability, Quality Control, Teacher Effectiveness, Error Patterns

Wechsler Trickle-Down Errors: A Comparison between Master's Students and Doctoral Students

Direct link

Jessica Stinson – ProQuest LLC, 2024

Intelligence tests have been used in the United States since the early 1900s for assessing soldiers during World War I (Kaufman & Harrison, 2008; White & Hall, 1980). Presently, cognitive assessments are used in school, civil service, military, clinical, and industry settings (White & Hall, 1980). Although the results of these…

Descriptors: Graduate Students, Masters Programs, Doctoral Programs, Comparative Analysis

Developing a Generic Scorer for Practice Writing Tests of Statewide Assessment Essays with Natural Language Processing Transfer Learning Techniques

Direct link

Yi Gui – ProQuest LLC, 2024

This study explores using transfer learning in machine learning for natural language processing (NLP) to create generic automated essay scoring (AES) models, providing instant online scoring for statewide writing assessments in K-12 education. The goal is to develop an instant online scorer that is generalizable to any prompt, addressing the…

Descriptors: Writing Tests, Natural Language Processing, Writing Evaluation, Scoring

Application of the Structured Analysis of Written Language Tool to the Writing of Deaf Chinese Students

Peer reviewed

Direct link

Yachong Cui; Rachel Saulsburry; Kimberly Wolbers – American Annals of the Deaf, 2024

Limited access to spoken and signed language is a worldwide phenomenon affecting deaf children. Language delay caused by impeded language acquisition has negative cascading effects on deaf children's learning and development. In the event of stymied language development, deaf students exhibit highly errored writing and commit errors unseen in the…

Descriptors: Deafness, Written Language, Writing Evaluation, North Americans