ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	12

Source

Language Testing

Publication Type

Journal Articles	13
Reports - Research	10
Reports - Descriptive	2
Information Analyses	1
Reports - Evaluative	1

Education Level

Secondary Education	4
Elementary Education	2
High Schools	1
Higher Education	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1

Audience

Location

Austria	1
Netherlands	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Application of an Automated Essay Scoring Engine to English Writing Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Chan, Kinnie Kin Yee; Bond, Trevor; Yan, Zi – Language Testing, 2023

We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into…

Descriptors: Computer Assisted Testing, Essays, Scoring, Scores

Exploring Potential Biases in GPT-4o's Ratings of English Language Learners' Essays

Peer reviewed

Direct link

Taichi Yamashita – Language Testing, 2025

With the rapid development of generative artificial intelligence (AI) frameworks (e.g., the generative pre-trained transformer [GPT]), a growing number of researchers have started to explore its potential as an automated essay scoring (AES) system. While previous studies have investigated the alignment between human ratings and GPT ratings, few…

Descriptors: Artificial Intelligence, English (Second Language), Second Language Learning, Second Language Instruction

Automated Scoring of Junior and Senior High Essays Using Coh-Metrix Features: Implications for Large-Scale Language Testing

Peer reviewed

Direct link

Latifi, Syed; Gierl, Mark – Language Testing, 2021

An automated essay scoring (AES) program is a software system that uses techniques from corpus and computational linguistics and machine learning to grade essays. In this study, we aimed to describe and evaluate particular language features of Coh-Metrix for a novel AES program that would score junior and senior high school students' essays from…

Descriptors: Writing Evaluation, Computer Assisted Testing, Scoring, Essays

Towards More Valid Scoring Criteria for Integrated Reading-Writing and Listening-Writing Summary Tasks

Peer reviewed

Direct link

Chan, Sathena; May, Lyn – Language Testing, 2023

Despite the increased use of integrated tasks in high-stakes academic writing assessment, research on rating criteria which reflect the unique construct of integrated summary writing skills is comparatively rare. Using a mixed-method approach of expert judgement, text analysis, and statistical analysis, this study examines writing features that…

Descriptors: Scoring, Writing Evaluation, Reading Tests, Listening Skills

Do Experience and Text Quality Matter for Raters' Decision-Making Behaviors?

Peer reviewed

Direct link

Sahan, Özgür; Razi, Salim – Language Testing, 2020

This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…

Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators

Task and Rater Effects in L2 Speaking and Writing: A Synthesis of Generalizability Studies

Peer reviewed

Direct link

In'nami, Yo; Koizumi, Rie – Language Testing, 2016

We addressed Deville and Chalhoub-Deville's (2006), Schoonen's (2012), and Xi and Mollaun's (2006) call for research into the contextual features that are considered related to person-by-task interactions in the framework of generalizability theory in two ways. First, we quantitatively synthesized the generalizability studies to determine the…

Descriptors: Evaluators, Second Language Learning, Writing Skills, Oral Language

Rating Written Performance: What Do Raters Do and Why?

Peer reviewed

Direct link

Kuiken, Folkert; Vedder, Ineke – Language Testing, 2014

This study investigates the relationship in L2 writing between raters' judgments of communicative adequacy and linguistic complexity by means of six-point Likert scales, and general measures of linguistic performance. The participants were 39 learners of Italian and 32 of Dutch, who wrote two short argumentative essays. The same writing tasks…

Descriptors: Writing Evaluation, Second Language Learning, Evaluators, Native Language

Complementing Human Judgment of Essays Written by English Language Learners with E-Rater[R] Scoring

Peer reviewed

Direct link

Enright, Mary K.; Quinlan, Thomas – Language Testing, 2010

E-rater[R] is an automated essay scoring system that uses natural language processing techniques to extract features from essays and to model statistically human holistic ratings. Educational Testing Service has investigated the use of e-rater, in conjunction with human ratings, to score one of the two writing tasks on the TOEFL-iBT[R] writing…

Descriptors: Second Language Learning, Scoring, Essays, Language Processing

An Investigation of Four Writing Traits and Two Tasks across Two Languages

Peer reviewed

Direct link

Bae, Jungok; Bachman, Lyle F. – Language Testing, 2010

This study investigated the validity of four theoretically motivated traits of writing ability across English and Korean, based on elementary school students' responses to letter- and story-writing tasks. Their responses were scored analytically and analyzed using confirmatory factor analysis. The findings include the following. A model of writing…

Descriptors: Elementary School Students, Validity, Korean, English (Second Language)

Students' Voices in the Evaluation of Their Written Summaries: Empowerment and Democracy for Test Takers?

Peer reviewed

Direct link

Yu, Guoxing – Language Testing, 2007

Two kinds of scoring templates were empirically derived from summaries written by experts and students to evaluate the quality of summaries written by the students. This paper reports students' attitudes towards the use of the two templates and its differential statistical effects on the judgment of students' summarization performance. It was…

Descriptors: Student Evaluation, Student Attitudes, Democracy, Educational Assessment

The Assessment of Writing Ability: Expert Readers versus Lay Readers.

Peer reviewed

Schoonen, Rob; And Others – Language Testing, 1997

Reports on three studies conducted in the Netherlands about the reading reliability of lay and expert readers in rating content and language usage of students' writing performances in three kinds of writing assignments. Findings reveal that expert readers are more reliable in rating usage, whereas both lay and expert readers are reliable raters of…

Descriptors: Foreign Countries, Interrater Reliability, Language Usage, Models

Scoring	13
Writing Evaluation	13
Evaluators	8
Second Language Learning	8
English (Second Language)	7
Essays	6
Scores	5
Language Tests	4
Writing Skills	4
Comparative Analysis	3
Computer Assisted Testing	3
Computer Software	3
Correlation	3
Foreign Countries	3
Models	3
Second Language Instruction	3
Secondary School Students	3
Artificial Intelligence	2
Computational Linguistics	2
Decision Making	2
Difficulty Level	2
Elementary School Students	2
Evaluation Criteria	2
Evaluation Methods	2
Generalization	2
More ▼

Bachman, Lyle F.	1
Bae, Jungok	1
Bond, Trevor	1
Chan, Kinnie Kin Yee	1
Chan, Sathena	1
Enright, Mary K.	1
Gierl, Mark	1
Gierl, Mark J.	1
In'nami, Yo	1
John Pill	1
Koizumi, Rie	1
Kuiken, Folkert	1
Latifi, Syed	1
May, Lyn	1
Quinlan, Thomas	1
Razi, Salim	1
Rebecca Sickinger	1
Sahan, Özgür	1
Schoonen, Rob	1
Shin, Jinnie	1
Taichi Yamashita	1
Tineke Brunfaut	1
Vedder, Ineke	1
Yan, Zi	1
More ▼