ERIC - Search Results

Publication Date

In 2025	5
Since 2024	9
Since 2021 (last 5 years)	16
Since 2016 (last 10 years)	30
Since 2006 (last 20 years)	35

Descriptor

Comparative Analysis	37
Evaluators	37
Writing Evaluation	37
English (Second Language)	24
Essays	24
Second Language Learning	24
Foreign Countries	16
Second Language Instruction	15
Computer Software	13
Scoring	13
Correlation	12
Artificial Intelligence	11
Scores	11
Scoring Rubrics	10
Accuracy	9
Computational Linguistics	9
Interrater Reliability	9
Evaluation Criteria	7
Grading	7
Grammar	7
Language Tests	7
Reliability	7
Teaching Methods	7
College Students	6
Computer Assisted Testing	6
More ▼

Publication Type

Journal Articles	33
Reports - Research	32
Tests/Questionnaires	4
Reports - Evaluative	3
Dissertations/Theses -…	2
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Higher Education	14
Postsecondary Education	14
Secondary Education	3
Adult Education	1
Early Childhood Education	1
Elementary Education	1
Grade 1	1
Grade 11	1
Grade 12	1
Grade 2	1
High Schools	1
Primary Education	1
More ▼

Audience

Location

China	4
Turkey	4
Japan	2
Thailand	2
Hong Kong	1
Pakistan	1
South Korea	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

International English…	3
Test of English as a Foreign…	2

What Works Clearinghouse Rating

Showing 1 to 15 of 37 results Save | Export

Grading the Graders: Comparing Generative AI and Human Assessment in Essay Evaluation

Peer reviewed

Direct link

Elizabeth L. Wetzler; Kenneth S. Cassidy; Margaret J. Jones; Chelsea R. Frazier; Nickalous A. Korbut; Chelsea M. Sims; Shari S. Bowen; Michael Wood – Teaching of Psychology, 2025

Background: Generative artificial intelligence (AI) represents a potentially powerful, time-saving tool for grading student essays. However, little is known about how AI-generated essay scores compare to human instructor scores. Objective: The purpose of this study was to compare the essay grading scores produced by AI with those of human…

Descriptors: Essays, Writing Evaluation, Scores, Evaluators

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Automated Essay Scoring and Revising Based on Open-Source Large Language Models

Peer reviewed

Direct link

Yishen Song; Qianta Zhu; Huaibo Wang; Qinhua Zheng – IEEE Transactions on Learning Technologies, 2024

Manually scoring and revising student essays has long been a time-consuming task for educators. With the rise of natural language processing techniques, automated essay scoring (AES) and automated essay revising (AER) have emerged to alleviate this burden. However, current AES and AER models require large amounts of training data and lack…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial

Peer reviewed

Direct link

Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024

In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…

Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Judges' Views on Pairwise Comparative Judgement and Rank Ordering as Alternatives to Analytical Essay Marking

Download full text

Walland, Emma – Research Matters, 2022

In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…

Descriptors: Essays, Grading, Writing Evaluation, Evaluators

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Artificial Intelligence as an Automated Essay Scoring Tool: A Focus on ChatGPT

Peer reviewed
PDF on ERIC

Download full text

Ahmet Can Uyar; Dilek Büyükahiska – International Journal of Assessment Tools in Education, 2025

This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners' essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, Teaching Methods

Assessing Second-Language Academic Writing: AI vs. Human Raters

Peer reviewed
PDF on ERIC

Download full text

Vasfiye Geçkin; Ebru Kiziltas; Çagatay Çinar – Journal of Educational Technology and Online Learning, 2023

The quality of writing in a second language (L2) is one of the indicators of the level of proficiency for many college students to be eligible for departmental studies. Although certain software programs, such as Intelligent Essay Assessor or IntelliMetric, have been introduced to evaluate second-language writing quality, an overall assessment of…

Descriptors: Writing Evaluation, Second Language Learning, Second Language Instruction, Language Proficiency

Investigating the Impact of Rater Training on Rater Errors in the Process of Assessing Writing Skill

Peer reviewed
PDF on ERIC

Download full text

Sata, Mehmet; Karakaya, Ismail – International Journal of Assessment Tools in Education, 2022

In the process of measuring and assessing high-level cognitive skills, interference of rater errors in measurements brings about a constant concern and low objectivity. The main purpose of this study was to investigate the impact of rater training on rater errors in the process of assessing individual performance. The study was conducted with a…

Descriptors: Evaluators, Training, Comparative Analysis, Academic Language

Exploring ChatGPT as a Writing Assessment Tool

Peer reviewed

Direct link

Junifer Leal Bucol; Napattanissa Sangkawong – Innovations in Education and Teaching International, 2025

This research paper employs an exploratory framework to evaluate the potential of ChatGPT as an Automated Writing Evaluation (AWE) tool in teaching English as a Foreign Language (EFL) in Thailand. The main objective is to investigate how well ChatGPT can assess students' writing using prompts and pre-defined rubrics compared to human raters.…

Descriptors: Artificial Intelligence, Computer Software, Teaching Methods, English (Second Language)

Previous Page | Next Page »

Pages: 1 | 2 | 3

Language Testing	3
Language Testing in Asia	3
International Journal of…	2
Language Assessment Quarterly	2
ProQuest LLC	2
Applied Measurement in…	1
Asia-Pacific Education…	1
British Journal of…	1
CALICO Journal	1
ETS Research Report Series	1
Educational Measurement:…	1
Educational Research and…	1
English Language Teaching	1
English Teaching	1
Grantee Submission	1
IEEE Transactions on Learning…	1
Innovations in Education and…	1
International Educational…	1
Journal of Baltic Science…	1
Journal of Educational…	1
Journal of Educational and…	1
Journal of Technical Writing…	1
Language Learning & Technology	1
Language Teaching Research…	1
Reading & Writing Quarterly	1
More ▼

Attali, Yigal	2
Linn, Robert L.	2
Aggarwal, Varun	1
Ahmet Can Uyar	1
Allen, Laura K.	1
Balzotti, Jon	1
Barkaoui, Khaled	1
Chelsea M. Sims	1
Chelsea R. Frazier	1
Coniam, David	1
Crossley, Scott A.	1
Dilek Büyükahiska	1
Ebru Kiziltas	1
Eckstein, Grant	1
Elizabeth L. Wetzler	1
Fatih Yavuz	1
Fernandez, Miguel	1
Gamze Yavas Çelik	1
Gierl, Mark J.	1
Guangtian Zhu	1
Guo, Liang	1
Hijikata, Yuko	1
Huaibo Wang	1
Jackie Eunjung Relyea	1
More ▼