ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	14

Descriptor

Accuracy	14
Computer Assisted Testing	14
Error Patterns	14
Scoring	6
Comparative Analysis	5
Correlation	4
Artificial Intelligence	3
Computational Linguistics	3
Evaluation Methods	3
Foreign Countries	3
Item Response Theory	3
Models	3
Natural Language Processing	3
Prediction	3
Test Items	3
Writing Evaluation	3
Adult Learning	2
Aphasia	2
Brain	2
Classification	2
Cognitive Processes	2
Computer Software	2
Essay Tests	2
Evaluators	2
Feedback (Response)	2
More ▼

Source

Journal of Speech, Language,…	3
Journal of Educational…	2
CALICO Journal	1
Educational Assessment	1
Educational and Psychological…	1
English in Australia	1
European Journal of Science…	1
International Educational…	1
International Journal of…	1
Journal of Experimental…	1
ProQuest LLC	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	10
Reports - Evaluative	2
Collected Works - Proceedings	1
Dissertations/Theses -…	1

Education Level

Higher Education	3
Postsecondary Education	3
Adult Education	2
Elementary Secondary Education	2
Secondary Education	2
Early Childhood Education	1
Elementary Education	1
Grade 10	1
Grade 11	1
Grade 9	1
High Schools	1
Junior High Schools	1
Kindergarten	1
Middle Schools	1
Primary Education	1
More ▼

Audience

Location

Germany	1
Pennsylvania (Pittsburgh)	1
United Kingdom	1
United Kingdom (Wales)	1
Uruguay	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 14 results Save | Export

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Detecting Differential Item Functioning in CAT Using IRT Residual DIF Approach

Peer reviewed

Direct link

Lim, Hwanggyu; Choe, Edison M. – Journal of Educational Measurement, 2023

The residual differential item functioning (RDIF) detection framework was developed recently under a linear testing context. To explore the potential application of this framework to computerized adaptive testing (CAT), the present study investigated the utility of the RDIF[subscript R] statistic both as an index for detecting uniform DIF of…

Descriptors: Test Items, Computer Assisted Testing, Item Response Theory, Adaptive Testing

Towards Trustworthy Autograding of Short, Multi-Lingual, Multi-Type Answers

Peer reviewed

Direct link

Schneider, Johannes; Richner, Robin; Riser, Micha – International Journal of Artificial Intelligence in Education, 2023

Autograding short textual answers has become much more feasible due to the rise of NLP and the increased availability of question-answer pairs brought about by a shift to online education. Autograding performance is still inferior to human grading. The statistical and black-box nature of state-of-the-art machine learning models makes them…

Descriptors: Grading, Natural Language Processing, Computer Assisted Testing, Ethics

Developing a Generic Scorer for Practice Writing Tests of Statewide Assessment Essays with Natural Language Processing Transfer Learning Techniques

Direct link

Yi Gui – ProQuest LLC, 2024

This study explores using transfer learning in machine learning for natural language processing (NLP) to create generic automated essay scoring (AES) models, providing instant online scoring for statewide writing assessments in K-12 education. The goal is to develop an instant online scorer that is generalizable to any prompt, addressing the…

Descriptors: Writing Tests, Natural Language Processing, Writing Evaluation, Scoring

Why Can't It Mark This One? A Qualitative Analysis of Student Writing Rejected by an Automated Essay Scoring System

Peer reviewed

Direct link

Reinertsen, Nathanael – English in Australia, 2018

The difference in how humans read and how Automated Essay Scoring (AES) systems process written language leads to a situation where a portion of student responses will be comprehensible to human markers, but unable to be parsed by AES systems. This paper examines a number of pieces of student writing that were marked by trained human markers, but…

Descriptors: Qualitative Research, Writing Evaluation, Essay Tests, Computer Assisted Testing

Validation of an Automated Procedure for Calculating Core Lexicon from Transcripts

Peer reviewed

Direct link

Dalton, Sarah Grace; Stark, Brielle C.; Fromm, Davida; Apple, Kristen; MacWhinney, Brian; Rensch, Amanda; Rowedder, Madyson – Journal of Speech, Language, and Hearing Research, 2022

Purpose: The aim of this study was to advance the use of structured, monologic discourse analysis by validating an automated scoring procedure for core lexicon (CoreLex) using transcripts. Method: Forty-nine transcripts from persons with aphasia and 48 transcripts from persons with no brain injury were retrieved from the AphasiaBank database. Five…

Descriptors: Validity, Discourse Analysis, Databases, Scoring

Playing with BEARS: Balancing Effort, Accuracy, and Response Speed in a Semantic Feature Verification Anomia Treatment Game

Peer reviewed

Direct link

Evans, William S.; Cavanaugh, Robert; Quique, Yina; Boss, Emily; Starns, Jeffrey J.; Hula, William D. – Journal of Speech, Language, and Hearing Research, 2021

Purpose: The purpose of this study was to develop and pilot a novel treatment framework called "BEARS" (Balancing Effort, Accuracy, and Response Speed). People with aphasia (PWA) have been shown to maladaptively balance speed and accuracy during language tasks. BEARS is designed to train PWA to balance speed-accuracy trade-offs and…

Descriptors: Accuracy, Semantics, Aphasia, Reaction Time

Objective Intelligibility Assessment by Automated Segmental and Suprasegmental Listening Error Analysis

Peer reviewed

Direct link

Jiao, Yishan; LaCross, Amy; Berisha, Visar; Liss, Julie – Journal of Speech, Language, and Hearing Research, 2019

Purpose: Subjective speech intelligibility assessment is often preferred over more objective approaches that rely on transcript scoring. This is, in part, because of the intensive manual labor associated with extracting objective metrics from transcribed speech. In this study, we propose an automated approach for scoring transcripts that provides…

Descriptors: Suprasegmentals, Phonemes, Error Patterns, Scoring

Validation of Automated Scoring for a Formative Assessment That Employs Scientific Argumentation

Peer reviewed

Direct link

Mao, Liyang; Liu, Ou Lydia; Roohr, Katrina; Belur, Vinetha; Mulholland, Matthew; Lee, Hee-Sun; Pallant, Amy – Educational Assessment, 2018

Scientific argumentation is one of the core practices for teachers to implement in science classrooms. We developed a computer-based formative assessment to support students' construction and revision of scientific arguments. The assessment is built upon automated scoring of students' arguments and provides feedback to students and teachers.…

Descriptors: Computer Assisted Testing, Science Tests, Scoring, Automation

The Short-Term and Long-Term Effects of AWE Feedback on ESL Students' Development of Grammatical Accuracy

Peer reviewed
PDF on ERIC

Download full text

Li, Zhi; Feng, Hui-Hsien; Saricaoglu, Aysel – CALICO Journal, 2017

This classroom-based study employs a mixed-methods approach to exploring both short-term and long-term effects of Criterion feedback on ESL students' development of grammatical accuracy. The results of multilevel growth modeling indicate that Criterion feedback helps students in both intermediate-high and advanced-low levels reduce errors in eight…

Descriptors: Feedback (Response), Criterion Referenced Tests, Computer Assisted Testing, Writing Tests

Peer reviewed

Direct link

Jackson, Margaret C.; Linden, David E. J.; Roberts, Mark V.; Kriegeskorte, Nikolaus; Haenschel, Corinna – Journal of Experimental Psychology: Learning, Memory, and Cognition, 2015

A number of studies have shown that visual working memory (WM) is poorer for complex versus simple items, traditionally accounted for by higher information load placing greater demands on encoding and storage capacity limits. Other research suggests that it may not be complexity that determines WM performance per se, but rather increased…

Descriptors: Visual Perception, Short Term Memory, Test Items, Cognitive Processes

Using a Model of Analysts' Judgments to Augment an Item Calibration Process

Peer reviewed

Direct link

Hauser, Carl; Thum, Yeow Meng; He, Wei; Ma, Lingling – Educational and Psychological Measurement, 2015

When conducting item reviews, analysts evaluate an array of statistical and graphical information to assess the fit of a field test (FT) item to an item response theory model. The process can be tedious, particularly when the number of human reviews (HR) to be completed is large. Furthermore, such a process leads to decisions that are susceptible…

Descriptors: Test Items, Item Response Theory, Research Methodology, Decision Making

Adult Science Learners' Mathematical Mistakes: An Analysis of Responses to Computer-Marked Questions

Peer reviewed
PDF on ERIC

Download full text

Jordan, Sally – European Journal of Science and Mathematics Education, 2014

Inspection of thousands of student responses to computer-marked assessment questions has brought insight into the errors made by adult distance learners of science. Most of the questions analysed were in summative use and required students to construct their own response. Both of these things increased confidence in the reliability of the…

Descriptors: Foreign Countries, Undergraduate Students, College Science, Science Education

Proceedings of the International Conference on Educational Data Mining (EDM) (11th, Raleigh, North Carolina, July 16-20, 2018)

Peer reviewed
PDF on ERIC

Download full text

Boyer, Kristy Elizabeth, Ed.; Yudelson, Michael, Ed. – International Educational Data Mining Society, 2018

The 11th International Conference on Educational Data Mining (EDM 2018) is held under the auspices of the International Educational Data Mining Society at the Templeton Landing in Buffalo, New York. This year's EDM conference was highly competitive, with 145 long and short paper submissions. Of these, 23 were accepted as full papers and 37…

Descriptors: Data Collection, Data Analysis, Computer Science Education, Program Proposals

Alex J. Mechaber	1
Apple, Kristen	1
Belur, Vinetha	1
Berisha, Visar	1
Boss, Emily	1
Boyer, Kristy Elizabeth, Ed.	1
Brian E. Clauser	1
Cavanaugh, Robert	1
Choe, Edison M.	1
Dalton, Sarah Grace	1
Evans, William S.	1
Feng, Hui-Hsien	1
Fromm, Davida	1
Haenschel, Corinna	1
Hauser, Carl	1
He, Wei	1
Hula, William D.	1
Jackson, Margaret C.	1
Jiao, Yishan	1
Jordan, Sally	1
Kai North	1
Kriegeskorte, Nikolaus	1
LaCross, Amy	1
Le An Ha	1
Lee, Hee-Sun	1
More ▼