ERIC - Search Results

Publication Date

In 2025	10
Since 2024	19
Since 2021 (last 5 years)	36
Since 2016 (last 10 years)	56
Since 2006 (last 20 years)	76

Descriptor

Evaluators	82
Computer Software	79
Second Language Learning	33
Comparative Analysis	32
English (Second Language)	27
Computational Linguistics	25
Foreign Countries	25
Artificial Intelligence	22
Evaluation Methods	22
Scoring	22
Language Tests	21
Second Language Instruction	21
Scores	20
Writing Evaluation	20
Accuracy	16
Correlation	16
Essays	16
Teaching Methods	16
Computer Assisted Testing	15
Language Proficiency	15
Interrater Reliability	11
Reliability	11
Undergraduate Students	10
College Students	9
College Faculty	8
More ▼

Publication Type

Journal Articles	82
Reports - Research	66
Reports - Descriptive	10
Tests/Questionnaires	8
Reports - Evaluative	5
Information Analyses	1
Numerical/Quantitative Data	1

Education Level

Higher Education	29
Postsecondary Education	27
Secondary Education	7
Elementary Secondary Education	4
High Schools	4
Elementary Education	3
Early Childhood Education	2
Junior High Schools	2
Primary Education	2
Grade 1	1
Grade 10	1
Grade 2	1
Grade 4	1
Grade 7	1
Grade 8	1
Intermediate Grades	1
Kindergarten	1
Middle Schools	1
More ▼

Audience

Location

China	4
Turkey	3
Japan	2
Algeria	1
Australia	1
Denmark	1
Europe	1
Finland	1
Hong Kong	1
Illinois (Urbana)	1
Iran	1
Kentucky	1
Ohio	1
Singapore	1
South Korea	1
Sri Lanka	1
Thailand	1
United Kingdom (England)	1
United Kingdom (London)	1
United States	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Test of English as a Foreign…	6
International English…	4
ACTFL Oral Proficiency…	1
Autism Diagnostic Observation…	1
Big Five Inventory	1
Flesch Kincaid Grade Level…	1
Fry Readability Formula	1
National Adult Literacy…	1
National Assessment of…	1
Test of English for…	1
Trends in International…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 82 results Save | Export

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Peer reviewed

Direct link

Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…

Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics

Can Large Language Models Replace Humans in Systematic Reviews? Evaluating GPT-4's Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature in Multiple Languages

Peer reviewed

Direct link

Qusai Khraisha; Sophie Put; Johanna Kappenberg; Azza Warraitch; Kristin Hadfield – Research Synthesis Methods, 2024

Systematic reviews are vital for guiding practice, research and policy, although they are often slow and labour-intensive. Large language models (LLMs) could speed up and automate systematic reviews, but their performance in such tasks has yet to be comprehensively evaluated against humans, and no study has tested Generative Pre-Trained…

Descriptors: Peer Evaluation, Research Reports, Artificial Intelligence, Computer Software

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Examining the Effect of Assessment Construct Characteristics on Machine Learning Scoring of Scientific Argumentation

Peer reviewed

Direct link

Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024

Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…

Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Assessing Academic Language in Tenth Grade Essays Using Natural Language Processing

Peer reviewed

Direct link

Andrew Potter; Mitchell Shortt; Maria Goldshtein; Rod D. Roscoe – Grantee Submission, 2025

Broadly defined, academic language (AL) is a set of lexical-grammatical norms and registers commonly used in educational and academic discourse. Mastery of academic language in writing is an important aspect of writing instruction and assessment. The purpose of this study was to use Natural Language Processing (NLP) tools to examine the extent to…

Descriptors: Academic Language, Natural Language Processing, Grammar, Vocabulary Skills

Automated Essay Scoring and Revising Based on Open-Source Large Language Models

Peer reviewed

Direct link

Yishen Song; Qianta Zhu; Huaibo Wang; Qinhua Zheng – IEEE Transactions on Learning Technologies, 2024

Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Automated Identification of Discourse Markers Using the NLP Approach: The Case of "Okay"

Peer reviewed
PDF on ERIC

Download full text

Sanosi, Abdulaziz; Abdalla, Mohamed – Australian Journal of Applied Linguistics, 2021

This study aimed to examine the potentials of the NLP approach in detecting discourse markers (DMs), namely okay, in transcribed spoken data. One hundred thirty-eight concordance lines were presented to human referees to judge the functions of okay in them as a DM or Non-DM. After that, the researchers used a Python script written according to the…

Descriptors: Natural Language Processing, Computational Linguistics, Programming Languages, Accuracy

Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial

Peer reviewed

Direct link

Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024

In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…

Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Source Inclusion in Synthesis Writing: An NLP Approach to Understanding Argumentation, Sourcing, and Essay Quality

Peer reviewed

Direct link

Crossley, Scott; Wan, Qian; Allen, Laura; McNamara, Danielle – Reading and Writing: An Interdisciplinary Journal, 2023

Synthesis writing is widely taught across domains and serves as an important means of assessing writing ability, text comprehension, and content learning. Synthesis writing differs from other types of writing in terms of both cognitive and task demands because it requires writers to integrate information across source materials. However, little is…

Descriptors: Writing Skills, Cognitive Processes, Essays, Cues

How Do Judges in Comparative Judgement Exercises Make Their Judgements?

Download full text

Leech, Tony; Chambers, Lucy – Research Matters, 2022

Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…

Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability

Exploring Potential Biases in GPT-4o's Ratings of English Language Learners' Essays

Peer reviewed

Direct link

Taichi Yamashita – Language Testing, 2025

With the rapid development of generative artificial intelligence (AI) frameworks (e.g., the generative pre-trained transformer [GPT]), a growing number of researchers have started to explore its potential as an automated essay scoring (AES) system. While previous studies have investigated the alignment between human ratings and GPT ratings, few…

Descriptors: Artificial Intelligence, English (Second Language), Second Language Learning, Second Language Instruction

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Language Testing	5
ETS Research Report Series	4
Research Synthesis Methods	3
Advances in Language and…	2
Assessment in Education:…	2
Computer Assisted Language…	2
English Language Teaching	2
Evaluation Review	2
Language Learning & Technology	2
TESOL Quarterly: A Journal…	2
Advances in Physiology…	1
Australian Journal of Applied…	1
Behaviour & Information…	1
British Journal of…	1
British Journal of…	1
Computers in the Schools	1
Contemporary Educational…	1
Creativity Research Journal	1
Dimension	1
Education	1
Education Next	1
Education and Information…	1
Educational Sciences: Theory…	1
Educational and Psychological…	1
Eurasian Journal of Applied…	1
More ▼

Bridgeman, Brent	2
Zechner, Klaus	2
Abdalla, Mohamed	1
Abedi, Jamal	1
Aghayi, Mohammad Bagher	1
Ahmet Can Uyar	1
Ahn, Soojin	1
Akbari, Alireza	1
Al-Harthi, Aisha Salim Ali	1
Alex J. Mechaber	1
Allen, Laura	1
Amanda Huee-Ping Wong	1
Andrew Potter	1
Armijo-Olivo, Susan	1
Azer, Haniyeh Sadeghi	1
Azza Warraitch	1
Bagheri, Mohammad Sadegh	1
Bahi, Halima	1
Bantum, Erin O'Carroll	1
Bejar, Isaac I.	1
Beltyukova, Svetlana A.	1
Bergeron, Annie	1
Bhattacharya, Joydeep	1
Bosch, Nigel	1
Breyer, F. Jay	1
More ▼