ERIC - Search Results

Publication Date

In 2025	3
Since 2024	4
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	24

Descriptor

Essays	25
Evaluators	25
Scores	25
Writing Evaluation	18
English (Second Language)	13
Comparative Analysis	12
Second Language Learning	12
Computer Software	8
Foreign Countries	7
Language Tests	7
Scoring	7
Computational Linguistics	6
Correlation	6
Interrater Reliability	6
Rating Scales	6
Scoring Rubrics	6
Artificial Intelligence	5
Computer Assisted Testing	5
Grading	5
Second Language Instruction	5
Writing Tests	5
Grammar	4
Secondary School Students	4
Writing Skills	4
Accuracy	3
More ▼

Source

Language Testing	8
ETS Research Report Series	2
Language Testing in Asia	2
AERA Online Paper Repository	1
Advances in Language and…	1
College Board	1
Contemporary Educational…	1
Education Journal	1
Grantee Submission	1
Higher Education Research and…	1
Journal of Baltic Science…	1
Journal of Educational…	1
Modern Language Journal	1
Reading and Writing: An…	1
TESOL Quarterly: A Journal…	1
Teaching of Psychology	1
More ▼

Publication Type

Journal Articles	22
Reports - Research	22
Tests/Questionnaires	6
Non-Print Media	1
Reference Materials - General	1
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

Higher Education	7
Postsecondary Education	5
Secondary Education	5
Elementary Education	1
Grade 12	1
High Schools	1

Audience

Location

China	1
Hawaii	1
Hong Kong	1
Japan	1
Netherlands	1
Pakistan	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

International English…	1
SAT (College Admission Test)	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Grading the Graders: Comparing Generative AI and Human Assessment in Essay Evaluation

Peer reviewed

Direct link

Elizabeth L. Wetzler; Kenneth S. Cassidy; Margaret J. Jones; Chelsea R. Frazier; Nickalous A. Korbut; Chelsea M. Sims; Shari S. Bowen; Michael Wood – Teaching of Psychology, 2025

Background: Generative artificial intelligence (AI) represents a potentially powerful, time-saving tool for grading student essays. However, little is known about how AI-generated essay scores compare to human instructor scores. Objective: The purpose of this study was to compare the essay grading scores produced by AI with those of human…

Descriptors: Essays, Writing Evaluation, Scores, Evaluators

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Source Inclusion in Synthesis Writing: An NLP Approach to Understanding Argumentation, Sourcing, and Essay Quality

Peer reviewed

Direct link

Crossley, Scott; Wan, Qian; Allen, Laura; McNamara, Danielle – Reading and Writing: An Interdisciplinary Journal, 2023

Synthesis writing is widely taught across domains and serves as an important means of assessing writing ability, text comprehension, and content learning. Synthesis writing differs from other types of writing in terms of both cognitive and task demands because it requires writers to integrate information across source materials. However, little is…

Descriptors: Writing Skills, Cognitive Processes, Essays, Cues

Application of an Automated Essay Scoring Engine to English Writing Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Chan, Kinnie Kin Yee; Bond, Trevor; Yan, Zi – Language Testing, 2023

We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into…

Descriptors: Computer Assisted Testing, Essays, Scoring, Scores

Exploring Potential Biases in GPT-4o's Ratings of English Language Learners' Essays

Peer reviewed

Direct link

Taichi Yamashita – Language Testing, 2025

With the rapid development of generative artificial intelligence (AI) frameworks (e.g., the generative pre-trained transformer [GPT]), a growing number of researchers have started to explore its potential as an automated essay scoring (AES) system. While previous studies have investigated the alignment between human ratings and GPT ratings, few…

Descriptors: Artificial Intelligence, English (Second Language), Second Language Learning, Second Language Instruction

Source Inclusion in Synthesis Writing: An NLP Approach to Understanding Argumentation, Sourcing, and Essay Quality

Peer reviewed
PDF on ERIC

Download full text

Direct link

Crossley, Scott; Wan, Qian; Allen, Laura; McNamara, Danielle – Grantee Submission, 2021

Descriptors: Writing Skills, Cognitive Processes, Essays, Cues

Investigating Human Essay Rating Quality in a Large-Scale Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Zhang, Xiuyuan – AERA Online Paper Repository, 2019

The main purpose of the study is to evaluate the qualities of human essay ratings for a large-scale assessment using Rasch measurement theory. Specifically, Many-Facet Rasch Measurement (MFRM) was utilized to examine the rating scale category structure and provide important information about interpretations of ratings in the large-scale…

Descriptors: Essays, Evaluators, Writing Evaluation, Reliability

Home-Grown Automated Essay Scoring in the Literature Classroom: A Solution for Managing the Crowd?

Peer reviewed
PDF on ERIC

Download full text

Uzun, Kutay – Contemporary Educational Technology, 2018

Managing crowded classes in terms of classroom assessment is a difficult task due to the amount of time which needs to be devoted to providing feedback to student products. In this respect, the present study aimed to develop an automated essay scoring environment as a potential means to overcome this problem. Secondarily, the study aimed to test…

Descriptors: Computer Assisted Testing, Essays, Scoring, English Literature

Writing Scale Effects on Raters: An Exploratory Study

Peer reviewed

Direct link

Jeong, Heejeong – Language Testing in Asia, 2019

In writing assessment, finding a valid, reliable, and efficient scale is critical. Appropriate scales, increase rater reliability, and can also save time and money. This exploratory study compared the effects of a binary scale and an analytic scale across teacher raters and expert raters. The purpose of the study is to find out how different scale…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Do Experience and Text Quality Matter for Raters' Decision-Making Behaviors?

Peer reviewed

Direct link

Sahan, Özgür; Razi, Salim – Language Testing, 2020

This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…

Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

Markers' Criteria in Assessing English Essays: An Exploratory Study of the Higher Secondary School Certificate (HSCC) in the Punjab Province of Pakistan

Peer reviewed

Direct link

Fernandez, Miguel; Siddiqui, Athar Munir – Language Testing in Asia, 2017

Background: Marking of essays is mainly carried out by human raters who bring in their own subjective and idiosyncratic evaluation criteria, which sometimes lead to discrepancy. This discrepancy may in turn raise issues like reliability and fairness. The current research attempts to explore the evaluation criteria of markers on a national level…

Descriptors: Grading, Evaluators, Evaluation Criteria, High Stakes Tests

Linguistic Features of Humor in Academic Writing

Peer reviewed
PDF on ERIC

Download full text

Skalicky, Stephen; Berger, Cynthia M.; Crossley, Scott A.; McNamara, Danielle S. – Advances in Language and Literary Studies, 2016

A corpus of 313 freshman college essays was analyzed in order to better understand the forms and functions of humor in academic writing. Human ratings of humor and wordplay were statistically aggregated using Factor Analysis to provide an overall "Humor" component score for each essay in the corpus. In addition, the essays were also…

Descriptors: Discourse Analysis, Academic Discourse, Humor, Writing (Composition)

Previous Page | Next Page »

Pages: 1 | 2

Allen, Laura	2
Attali, Yigal	2
Crossley, Scott	2
DeCarlo, Lawrence T.	2
McNamara, Danielle	2
Wan, Qian	2
Ann Tai Choe	1
Barkaoui, Khaled	1
Berger, Cynthia M.	1
Bond, Trevor	1
Breyer, F. Jay	1
Chan, Kinnie Kin Yee	1
Chelsea M. Sims	1
Chelsea R. Frazier	1
Coniam, David	1
Crossley, Scott A.	1
Daniel Holden	1
Daniel R. Isbell	1
Elizabeth L. Wetzler	1
Fernandez, Miguel	1
Fritz, Erik	1
Gierl, Mark J.	1
Guangtian Zhu	1
Holland, Jennifer	1
Jeong, Heejeong	1
More ▼