ERIC - Search Results

Publication Date

In 2025	14
Since 2024	25
Since 2021 (last 5 years)	50
Since 2016 (last 10 years)	73

Descriptor

Computer Software	73
Evaluators	73
Comparative Analysis	37
Artificial Intelligence	31
Computational Linguistics	27
Second Language Learning	26
Writing Evaluation	24
Foreign Countries	22
English (Second Language)	21
Second Language Instruction	21
Accuracy	20
Scoring	19
Scores	18
Essays	17
Teaching Methods	16
Correlation	15
Evaluation Methods	14
Language Tests	14
Computer Assisted Testing	12
Interrater Reliability	11
Language Proficiency	11
Writing Instruction	11
Reliability	10
Undergraduate Students	10
Feedback (Response)	9
More ▼

Publication Type

Reports - Research	63
Journal Articles	60
Speeches/Meeting Papers	7
Tests/Questionnaires	7
Dissertations/Theses -…	3
Reports - Evaluative	3
Reports - Descriptive	2
Guides - Non-Classroom	1
Information Analyses	1

Education Level

Higher Education	28
Postsecondary Education	27
Secondary Education	8
Elementary Secondary Education	5
Elementary Education	4
High Schools	4
Early Childhood Education	2
Primary Education	2
Grade 1	1
Grade 10	1
Grade 2	1
Grade 4	1
Grade 8	1
Intermediate Grades	1
Junior High Schools	1
Kindergarten	1
Middle Schools	1
More ▼

Audience

Researchers

Location

China	4
Japan	2
Turkey	2
Algeria	1
Canada	1
Cuba	1
Europe	1
Hong Kong	1
India	1
Indiana	1
Israel	1
Kentucky	1
Singapore	1
South Korea	1
Sri Lanka	1
Thailand	1
Turkey (Istanbul)	1
United Kingdom (England)	1
United Kingdom (London)	1
Vietnam	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

International English…	3
Test of English as a Foreign…	2
Torrance Tests of Creative…	2
ACTFL Oral Proficiency…	1
Autism Diagnostic Observation…	1
Big Five Inventory	1
National Assessment of…	1
Test of English for…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 73 results Save | Export

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Peer reviewed

Direct link

Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…

Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics

Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT-4's Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature in Multiple Languages

Peer reviewed

Direct link

Qusai Khraisha; Sophie Put; Johanna Kappenberg; Azza Warraitch; Kristin Hadfield – Research Synthesis Methods, 2024

Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained…

Descriptors: Peer Evaluation, Research Reports, Artificial Intelligence, Computer Software

Grading the Graders: Comparing Generative AI and Human Assessment in Essay Evaluation

Peer reviewed

Direct link

Elizabeth L. Wetzler; Kenneth S. Cassidy; Margaret J. Jones; Chelsea R. Frazier; Nickalous A. Korbut; Chelsea M. Sims; Shari S. Bowen; Michael Wood – Teaching of Psychology, 2025

Background: Generative artificial intelligence (AI) represents a potentially powerful, time-saving tool for grading student essays. However, little is known about how AI-generated essay scores compare to human instructor scores. Objective: The purpose of this study was to compare the essay grading scores produced by AI with those of human…

Descriptors: Essays, Writing Evaluation, Scores, Evaluators

AI-Enabled Correction: A Professor's Journey

Peer reviewed

Direct link

Peter Daly; Emmanuelle Deglaire – Innovations in Education and Teaching International, 2025

AI-enabled assessment of student papers has the potential to provide both summative and formative feedback and reduce the time spent on grading. Using auto-ethnography, this study compares AI-enabled and human assessment of business student examination papers in a law module based on previously established rubrics. Examination papers were…

Descriptors: Artificial Intelligence, Computer Software, Technology Integration, College Faculty

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

Automatic Wordnet Construction and Its Application in Generating Distractors for Cloze Questions

Direct link

Yicheng Sun – ProQuest LLC, 2024

We study how to automatically generate cloze questions from given texts to assess reading comprehension, where a cloze question consists of a stem with a blank space holder for the answer key, and three distractors for generating confusions. We present a generative method called CQG (Cloze Question Generator) for constructing cloze questions from…

Descriptors: Cloze Procedure, Reading Processes, Questioning Techniques, Computational Linguistics

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation

Peer reviewed

Direct link

Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024

Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…

Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems

Content and Item Response Theory Analysis of ChatGPT-4-Generated Multiple-Choice Items

Peer reviewed

Direct link

Roger Young; Emily Courtney; Alexander Kah; Mariah Wilkerson; Yi-Hsin Chen – Teaching of Psychology, 2025

Background: Multiple-choice item (MCI) assessments are burdensome for instructors to develop. Artificial intelligence (AI, e.g., ChatGPT) can streamline the process without sacrificing quality. The quality of AI-generated MCIs and human experts is comparable. However, whether the quality of AI-generated MCIs is equally good across various domain-…

Descriptors: Item Response Theory, Multiple Choice Tests, Psychology, Textbooks

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Assessing Academic Language in Tenth Grade Essays Using Natural Language Processing

Peer reviewed

Direct link

Andrew Potter; Mitchell Shortt; Maria Goldshtein; Rod D. Roscoe – Grantee Submission, 2025

Broadly defined, academic language (AL) is a set of lexical-grammatical norms and registers commonly used in educational and academic discourse. Mastery of academic language in writing is an important aspect of writing instruction and assessment. The purpose of this study was to use Natural Language Processing (NLP) tools to examine the extent to…

Descriptors: Academic Language, Natural Language Processing, Grammar, Vocabulary Skills

Automated Essay Scoring and Revising Based on Open-Source Large Language Models

Peer reviewed

Direct link

Yishen Song; Qianta Zhu; Huaibo Wang; Qinhua Zheng – IEEE Transactions on Learning Technologies, 2024

Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mengxue; Heffernan, Neil; Lan, Andrew – International Educational Data Mining Society, 2023

Automated scoring of student responses to open-ended questions, including short-answer questions, has great potential to scale to a large number of responses. Recent approaches for automated scoring rely on supervised learning, i.e., training classifiers or fine-tuning language models on a small number of responses with human-provided score…

Descriptors: Scoring, Computer Assisted Testing, Mathematics Instruction, Mathematics Tests

Automated Identification of Discourse Markers Using the NLP Approach: The Case of "Okay"

Peer reviewed
PDF on ERIC

Download full text

Sanosi, Abdulaziz; Abdalla, Mohamed – Australian Journal of Applied Linguistics, 2021

This study aimed to examine the potentials of the NLP approach in detecting discourse markers (DMs), namely okay, in transcribed spoken data. One hundred thirty-eight concordance lines were presented to human referees to judge the functions of okay in them as a DM or Non-DM. After that, the researchers used a Python script written according to the…

Descriptors: Natural Language Processing, Computational Linguistics, Programming Languages, Accuracy

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

International Educational…	5
Grantee Submission	4
Language Testing	3
ProQuest LLC	3
Research Synthesis Methods	3
Assessment in Education:…	2
Computer Assisted Language…	2
Innovations in Education and…	2
Teaching of Psychology	2
Advances in Language and…	1
Advances in Physiology…	1
Asian Journal of Distance…	1
Australian Journal of Applied…	1
British Journal of…	1
Computers in the Schools	1
Contemporary Educational…	1
Creativity Research Journal	1
Education and Information…	1
Educational and Psychological…	1
English Language Teaching	1
Eurasian Journal of Applied…	1
IEEE Transactions on Learning…	1
Innovation in Language…	1
International Journal for…	1
International Journal for the…	1
More ▼

Allen, Laura	2
Crossley, Scott	2
McNamara, Danielle	2
Wan, Qian	2
Abdalla, Mohamed	1
Aggarwal, Varun	1
Ahmet Can Uyar	1
Ahn, Soojin	1
Akbari, Alireza	1
Al-Harthi, Aisha Salim Ali	1
Alex J. Mechaber	1
Alexander Kah	1
Amanda Huee-Ping Wong	1
Andrew Potter	1
Armijo-Olivo, Susan	1
Aysegül Liman-Kaban	1
Azza Warraitch	1
Bahi, Halima	1
Bergeron, Annie	1
Bhattacharya, Joydeep	1
Bosch, Nigel	1
Brannen, Kathleen	1
Brian E. Clauser	1
Buddhima Karunarathna, J. A.…	1
Campbell, Chris	1
More ▼