ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	13

Source

Language Testing

Publication Type

Journal Articles	13
Reports - Research	11
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Secondary Education	3
Elementary Education	1
Postsecondary Education	1

Audience

Location

Hawaii	1
Netherlands	1
Ohio	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Application of an Automated Essay Scoring Engine to English Writing Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Chan, Kinnie Kin Yee; Bond, Trevor; Yan, Zi – Language Testing, 2023

We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into…

Descriptors: Computer Assisted Testing, Essays, Scoring, Scores

Assessing the Content Quality of Essays in Content and Language Integrated Learning: Exploring the Construct from Subject Specialists' Perspectives

Peer reviewed

Direct link

Takanori Sato – Language Testing, 2024

Assessing the content of learners' compositions is a common practice in second language (L2) writing assessment. However, the construct definition of content in L2 writing assessment potentially underrepresents the target competence in content and language integrated learning (CLIL), which aims to foster not only L2 proficiency but also critical…

Descriptors: Language Tests, Content and Language Integrated Learning, Writing Evaluation, Writing Tests

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Exploring Potential Biases in GPT-4o's Ratings of English Language Learners' Essays

Peer reviewed

Direct link

Taichi Yamashita – Language Testing, 2025

With the rapid development of generative artificial intelligence (AI) frameworks (e.g., the generative pre-trained transformer [GPT]), a growing number of researchers have started to explore its potential as an automated essay scoring (AES) system. While previous studies have investigated the alignment between human ratings and GPT ratings, few…

Descriptors: Artificial Intelligence, English (Second Language), Second Language Learning, Second Language Instruction

Do Experience and Text Quality Matter for Raters' Decision-Making Behaviors?

Peer reviewed

Direct link

Sahan, Özgür; Razi, Salim – Language Testing, 2020

This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…

Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

Grounding Lexical Diversity in Human Judgments

Peer reviewed

Direct link

Jarvis, Scott – Language Testing, 2017

The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…

Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers

Quantifying the Quality Difference between L1 and L2 Essays: A Rating Procedure with Bilingual Raters and L1 and L2 Benchmark Essays

Peer reviewed

Direct link

Tillema, Marion; van den Bergh, Huub; Rijlaarsdam, Gert; Sanders, Ted – Language Testing, 2013

It is the consensus that, as a result of the extra constraints placed on working memory, texts written in a second language (L2) are usually of lower quality than texts written in the first language (L1) by the same writer. However, no method is currently available for quantifying the quality difference between L1 and L2 texts. In the present…

Descriptors: Academic Achievement, Bilingualism, Effect Size, Essays

Explaining ESL Essay Holistic Scores: A Multilevel Modeling Approach

Peer reviewed

Direct link

Barkaoui, Khaled – Language Testing, 2010

This study adopted a multilevel modeling (MLM) approach to examine the contribution of rater and essay factors to variability in ESL essay holistic scores. Previous research aiming to explain variability in essay holistic scores has focused on either rater or essay factors. The few studies that have examined the contribution of more than one…

Descriptors: Performance Based Assessment, English (Second Language), Second Language Learning, Holistic Approach

Think-Aloud Protocols in Research on Essay Rating: An Empirical Study of Their Veridicality and Reactivity

Peer reviewed

Direct link

Barkaoui, Khaled – Language Testing, 2011

Think-aloud protocols (TAPs) are frequently used in research on essay rating processes. However, there are very few empirical studies of the completeness of TAP data and the effects of this technique on rater performance (i.e., rating processes and outcomes). This study aims to start to address this research gap. As part of a larger study on rater…

Descriptors: Protocol Analysis, Rating Scales, Essays, English (Second Language)

Complementing Human Judgment of Essays Written by English Language Learners with E-Rater[R] Scoring

Peer reviewed

Direct link

Enright, Mary K.; Quinlan, Thomas – Language Testing, 2010

E-rater[R] is an automated essay scoring system that uses natural language processing techniques to extract features from essays and to model statistically human holistic ratings. Educational Testing Service has investigated the use of e-rater, in conjunction with human ratings, to score one of the two writing tasks on the TOEFL-iBT[R] writing…

Descriptors: Second Language Learning, Scoring, Essays, Language Processing

Evaluators	13
Essays	12
Writing Evaluation	11
English (Second Language)	10
Second Language Learning	10
Scores	8
Scoring	6
Language Tests	5
Comparative Analysis	4
Writing Tests	4
Computer Software	3
Decision Making	3
Interrater Reliability	3
Item Response Theory	3
Language Proficiency	3
Protocol Analysis	3
Rating Scales	3
Second Language Instruction	3
Secondary School Students	3
Training	3
Writing (Composition)	3
Accuracy	2
Artificial Intelligence	2
Certification	2
Computational Linguistics	2
More ▼

Barkaoui, Khaled	2
Ann Tai Choe	1
Attali, Yigal	1
Bond, Trevor	1
Chan, Kinnie Kin Yee	1
Chuang, Ping-Lin	1
Daniel Holden	1
Daniel R. Isbell	1
Enright, Mary K.	1
Gierl, Mark J.	1
Jarvis, Scott	1
Quinlan, Thomas	1
Razi, Salim	1
Rijlaarsdam, Gert	1
Sahan, Özgür	1
Sanders, Ted	1
Shin, Jinnie	1
Taichi Yamashita	1
Takanori Sato	1
Tillema, Marion	1
Yan, Xun	1
Yan, Zi	1
Yu-Tzu Chang	1
van den Bergh, Huub	1
More ▼