ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	12
Since 2017 (last 10 years)	26
Since 2007 (last 20 years)	44

Descriptor

Comparative Analysis	63
Writing Evaluation	63
Interrater Reliability	31
Foreign Countries	24
English (Second Language)	23
Reliability	23
Essays	21
Scoring	19
Second Language Learning	19
Scores	18
Evaluators	16
Writing Skills	16
Correlation	14
Evaluation Methods	14
Validity	14
Writing Instruction	13
Scoring Rubrics	12
Second Language Instruction	12
Statistical Analysis	11
Test Reliability	11
Computer Software	10
Student Evaluation	10
Test Validity	9
Writing Tests	9
Accuracy	8
More ▼

Publication Type

Reports - Research	49
Journal Articles	46
Reports - Evaluative	7
Speeches/Meeting Papers	7
Tests/Questionnaires	5
Reports - Descriptive	3
Dissertations/Theses -…	2
Information Analyses	2
Guides - Classroom - Teacher	1
Numerical/Quantitative Data	1
Opinion Papers	1
More ▼

Education Level

Higher Education	13
Postsecondary Education	10
Elementary Education	7
Secondary Education	7
Elementary Secondary Education	3
Grade 5	2
High Schools	2
Junior High Schools	2
Kindergarten	2
Middle Schools	2
Adult Education	1
Grade 1	1
Grade 11	1
Grade 2	1
Grade 3	1
Grade 4	1
Grade 6	1
Grade 7	1
Grade 8	1
Grade 9	1
Intermediate Grades	1
More ▼

Audience

Practitioners	2
Researchers	1
Teachers	1

Location

China	5
Australia	3
United Kingdom (England)	3
Canada	2
Iran	2
Turkey	2
Austria	1
California	1
Egypt	1
Germany	1
Hong Kong	1
Japan	1
Jordan	1
Philippines	1
Spain	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

International English…	2
Graduate Management Admission…	1
National Assessment Program…	1
Wechsler Individual…	1
Woodcock Johnson Tests of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 63 results Save | Export

Coherence-Based Automatic Short Answer Scoring Using Sentence Embedding

Peer reviewed

Direct link

Dadi Ramesh; Suresh Kumar Sanampudi – European Journal of Education, 2024

Automatic essay scoring (AES) is an essential educational application in natural language processing. This automated process will alleviate the burden by increasing the reliability and consistency of the assessment. With the advances in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy.…

Descriptors: Scoring, Essays, Writing Evaluation, Memory

The Classification Accuracy and Consistency of Comparative Judgement of Writing Compared to Rubric-Based Teacher Assessment

Peer reviewed

Direct link

Pinot de Moira, Anne; Wheadon, Christopher; Christodoulou, Daisy – Research in Education, 2022

Writing is generally assessed internationally using rubric-based approaches, but there is a growing body of evidence to suggest that the reliability of such approaches is poor. In contrast, comparative judgement studies suggest that it is possible to assess open ended tasks such as writing with greater reliability. Many previous studies, however,…

Descriptors: Writing Evaluation, Classification, Accuracy, Scoring Rubrics

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Judges' Views on Pairwise Comparative Judgement and Rank Ordering as Alternatives to Analytical Essay Marking

Download full text

Walland, Emma – Research Matters, 2022

In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…

Descriptors: Essays, Grading, Writing Evaluation, Evaluators

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Rubric Rating with MFRM versus Randomly Distributed Comparative Judgment: A Comparison of Two Approaches to Second-Language Writing Assessment

Peer reviewed

Direct link

Sims, Maureen E.; Cox, Troy L.; Eckstein, Grant T.; Hartshorn, K. James; Wilcox, Matthew P.; Hart, Judson M. – Educational Measurement: Issues and Practice, 2020

The purpose of this study is to explore the reliability of a potentially more practical approach to direct writing assessment in the context of ESL writing. Traditional rubric rating (RR) is a common yet resource-intensive evaluation practice when performed reliably. This study compared the traditional rubric model of ESL writing assessment and…

Descriptors: Scoring Rubrics, Item Response Theory, Second Language Learning, English (Second Language)

Re-Imagining Narrative Writing and Assessment: A Post-NAPLAN Craft-Based Rubric for Creative Writing

Peer reviewed

Direct link

Michael D. Carey; Shelley Davidow; Paul Williams – Australian Journal of Language and Literacy, 2022

According to creative writing pedagogies academic Susanne Gannon ("English in Australia, 54"(2), 43-56, 2019), and the Federal government-commissioned NAPLAN review (McGaw et al., 2020), NAPLAN has restricted how writing is taught in secondary schools. A NAPLAN-influenced structural approach to teaching writing has subsumed the…

Descriptors: Scoring Rubrics, Creative Writing, Writing Evaluation, National Competency Tests

Impacts of ChatGPT-Assisted Writing for EFL English Majors: Feasibility and Challenges

Peer reviewed

Direct link

Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024

To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)

Investigating the Impact of Rater Training on Rater Errors in the Process of Assessing Writing Skill

Peer reviewed
PDF on ERIC

Download full text

Sata, Mehmet; Karakaya, Ismail – International Journal of Assessment Tools in Education, 2022

In the process of measuring and assessing high-level cognitive skills, interference of rater errors in measurements brings about a constant concern and low objectivity. The main purpose of this study was to investigate the impact of rater training on rater errors in the process of assessing individual performance. The study was conducted with a…

Descriptors: Evaluators, Training, Comparative Analysis, Academic Language

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Language Testing	3
Applied Measurement in…	2
Assessing Writing	2
Assessment in Education:…	2
Educational Measurement:…	2
Online Submission	2
ProQuest LLC	2
Research in the Teaching of…	2
Assessment for Effective…	1
Australian Educational…	1
Australian Journal of…	1
British Journal of…	1
CALICO Journal	1
Canadian Journal of Learning…	1
Education and Information…	1
Educational Research	1
Educational Research and…	1
Educational Researcher	1
English Teaching	1
European Journal of Education	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Iranian Journal of Language…	1
Journal of Applied Testing…	1
More ▼

Humphry, Stephen M.	2
Stiggins, Richard J.	2
Abdel-Haq, Eman Muhammad	1
Abrami, Philip C.	1
Al-Sayed, Rania Kamal Muhammad	1
Al-Shbeil, Abeer	1
Ali, Mahsoub Abdel-Sadeq	1
Allen, Abigail	1
Attali, Yigal	1
Baier, Herbert	1
Baker, Beverly A.	1
Balzotti, Jon	1
Barclay, Alexandra	1
Barkaoui, Khaled	1
Bauer, Barbara Ann	1
Behizadeh, Nadia	1
Bhola, Dennison S.	1
Bridgeman, Brent	1
Buckendahl, Chad W.	1
Bures, Eva Mary	1
Christodoulou, Daisy	1
Chung-You Tsai	1
Clark, Irene Lurkis	1
Collier, Lizabeth C.	1
More ▼