ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	22

Descriptor

Comparative Analysis	31
Interrater Reliability	31
Writing Evaluation	31
Essays	14
English (Second Language)	13
Scores	13
Foreign Countries	12
Second Language Learning	11
Correlation	9
Evaluators	9
Scoring	9
Evaluation Methods	8
Second Language Instruction	8
Statistical Analysis	8
Writing Skills	8
Accuracy	6
Student Evaluation	6
Writing (Composition)	6
Writing Instruction	6
Computer Software	5
Essay Tests	5
Writing Tests	5
College Students	4
Computer Assisted Testing	4
Decision Making	4
More ▼

Publication Type

Reports - Research	23
Journal Articles	22
Reports - Evaluative	5
Speeches/Meeting Papers	4
Tests/Questionnaires	4
Dissertations/Theses -…	2
Information Analyses	1
Numerical/Quantitative Data	1
Reports - Descriptive	1

Education Level

Higher Education	11
Postsecondary Education	8
Elementary Education	3
Secondary Education	3
Adult Education	1
Grade 1	1
Grade 11	1
Grade 2	1
Grade 4	1
Grade 5	1
Grade 6	1
High Schools	1
Intermediate Grades	1
Kindergarten	1
More ▼

Audience

Practitioners	1
Teachers	1

Location

China	3
Canada	2
Iran	2
Australia	1
California	1
Germany	1
Hong Kong	1
Japan	1
Philippines	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Management Admission…	1
Wechsler Individual…	1
Woodcock Johnson Tests of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 31 results Save | Export

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Impacts of ChatGPT-Assisted Writing for EFL English Majors: Feasibility and Challenges

Peer reviewed

Direct link

Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024

To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

Writing Scale Effects on Raters: An Exploratory Study

Peer reviewed

Direct link

Jeong, Heejeong – Language Testing in Asia, 2019

In writing assessment, finding a valid, reliable, and efficient scale is critical. Appropriate scales, increase rater reliability, and can also save time and money. This exploratory study compared the effects of a binary scale and an analytic scale across teacher raters and expert raters. The purpose of the study is to find out how different scale…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Effect of Quality Characteristics of Peer Raters on Rating Errors in Peer Assessment

Peer reviewed

Direct link

Guo, Xiuyan; Lei, Pui-Wa – International Journal of Testing, 2020

Little research has been done on the effects of peer raters' quality characteristics on peer rating qualities. This study aims to address this gap and investigate the effects of key variables related to peer raters' qualities, including content knowledge, previous rating experience, training on rating tasks, and rating motivation. In an experiment…

Descriptors: Peer Evaluation, Error Patterns, Correlation, Knowledge Level

Exploration of New Complexity Metrics for Curriculum-Based Measures of Writing

Peer reviewed
PDF on ERIC

Download full text

Direct link

Wagner, Kyle; Smith, Alex; Allen, Abigail; McMaster, Kristen; Poch, Apryl; Lembke, Erica – Assessment for Effective Intervention, 2019

Researchers and practitioners have questioned whether scoring procedures used with curriculum-based measures of writing (CBM-W) capture growth in complexity of writing. We analyzed data from six independent samples to examine two potential scoring metrics for picture word CBM-W (PW), a sentence-level CBM task. Correct word sequences per response…

Descriptors: Curriculum Based Assessment, Writing Evaluation, Comparative Analysis, Scoring

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

The Relationship between the Amount of Extensive Reading and the Writing Performance

Peer reviewed

Direct link

Sakurai, Nobuko – Reading Matrix: An International Online Journal, 2017

This paper explored the effects of the amount of extensive reading (ER) on writing ability. Participants were 157 first and second-year non-English majors at a private university in Japan who took a writing test in class. Some of them were reading extensively, while others had no experience in ER. The outcomes of Pearson's correlation indicated…

Descriptors: Correlation, Reading Writing Relationship, Scores, Vocabulary Development

Differences in Less Proficient and More Proficient ESL College Writing in the Philippine Setting

Download full text

Gustilo, Leah E. – Online Submission, 2016

The present study aimed at characterizing what skilled or more proficient ESL college writing is in the Philippine setting through a contrastive analysis of three groups of variables identified from previous studies: resources, processes, and performance of ESL writers. Based on Chenoweth and Hayes' (2001; 2003) framework, the resource level…

Descriptors: Language Proficiency, English (Second Language), Second Language Learning, Foreign Countries

Investigating the Effects of Planning Time on the Complexity of L2 Argumentative Writing

Peer reviewed
PDF on ERIC

Download full text

Tabari, Mahmoud Abdi – TESL-EJ, 2017

Much research has investigated the role of planning time in second language writing; however, the results show that there are inconsistent findings about the effects of planning time conditions on the complexity of the EFL learners' textual output. The current study attempted to consider the differential effects of planning time conditions in…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Persuasive Discourse

Using Calibrated Exemplars in the Teacher-Assessment of Writing: An Empirical Study

Peer reviewed

Direct link

Heldsinger, Sandra A.; Humphry, Stephen M. – Educational Research, 2013

Background: Many in education argue for the importance of incorporating teacher judgements in the assessment and reporting of student performance. Advocates of such an approach are cognisant, though, that obtaining a satisfactory level of consistency in teacher judgements poses a challenge. Purpose: This study investigates the extent to which the…

Descriptors: Evaluation Methods, Student Evaluation, Teacher Attitudes, Comparative Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3

Assessing Writing	2
ProQuest LLC	2
Applied Measurement in…	1
Assessment for Effective…	1
Assessment in Education:…	1
Canadian Journal of Learning…	1
Education and Information…	1
Educational Measurement:…	1
Educational Research	1
Educational Research and…	1
English Teaching	1
International Journal of…	1
Iranian Journal of Language…	1
Journal of Applied Testing…	1
Journal of Baltic Science…	1
Journal of Educational…	1
Language Testing	1
Language Testing in Asia	1
Online Submission	1
Reading & Writing Quarterly	1
Reading Matrix: An…	1
Research-publishing.net	1
TESL-EJ	1
More ▼

Abrami, Philip C.	1
Allen, Abigail	1
Attali, Yigal	1
Baier, Herbert	1
Baker, Beverly A.	1
Barclay, Alexandra	1
Barkaoui, Khaled	1
Bhola, Dennison S.	1
Bridgeman, Brent	1
Buckendahl, Chad W.	1
Bures, Eva Mary	1
Chung-You Tsai	1
Collier, Lizabeth C.	1
Coniam, David	1
Cooper, Peter	1
De Ayala, R. J.	1
Gearhart, Maryl	1
Guangtian Zhu	1
Guo, Xiuyan	1
Gustilo, Leah E.	1
Heldsinger, Sandra A.	1
Humphry, Stephen M.	1
Iain Kelsall Brown	1
Jeong, Heejeong	1
More ▼