ERIC - Search Results

Publication Date

In 2025	2
Since 2024	7
Since 2021 (last 5 years)	16
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	46

Descriptor

Writing Evaluation	57
Second Language Learning	43
English (Second Language)	37
Language Tests	33
Evaluators	22
Writing Tests	18
Foreign Countries	17
Language Proficiency	17
Second Language Instruction	17
Essays	16
Scores	16
Rating Scales	14
Scoring	13
Writing Skills	13
Language Teachers	11
Correlation	10
Writing (Composition)	10
College Students	8
Computational Linguistics	8
Grammar	8
Interrater Reliability	8
Computer Assisted Testing	7
Evaluation Criteria	7
Statistical Analysis	7
Accuracy	6
More ▼

Source

Language Testing

Publication Type

Journal Articles	57
Reports - Research	42
Reports - Evaluative	7
Reports - Descriptive	6
Tests/Questionnaires	4
Information Analyses	3
Opinion Papers	1

Education Level

Higher Education	13
Secondary Education	8
Postsecondary Education	6
Elementary Education	4
High Schools	4
Adult Education	1
Junior High Schools	1
Middle Schools	1

Audience

Location

China	3
Australia	2
Japan	2
Netherlands	2
Austria	1
Canada	1
Croatia	1
Egypt	1
Europe	1
Hawaii	1
Iowa	1
Ohio	1
South Africa	1
Taiwan	1
Turkey	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	9
Test of Written English	1

What Works Clearinghouse Rating

Showing 1 to 15 of 57 results Save | Export

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Diagnosing Chinese EFL Learners' Writing Ability Using Polytomous Cognitive Diagnostic Models

Peer reviewed

Direct link

Xiaoting Shi; Xiaomei Ma; Wenbo Du; Xuliang Gao – Language Testing, 2024

Cognitive diagnostic assessment (CDA) intends to identify learners' strengths and weaknesses in latent cognitive attributes to provide personalized remedial instructions. Previous CDA studies on English as a Foreign Language (EFL)/English as a Second Language (ESL) writing have adopted dichotomous cognitive diagnostic models (CDMs) to analyze data…

Descriptors: Writing Evaluation, Writing Tests, Diagnostic Tests, English (Second Language)

Assessing the Content Quality of Essays in Content and Language Integrated Learning: Exploring the Construct from Subject Specialists' Perspectives

Peer reviewed

Direct link

Takanori Sato – Language Testing, 2024

Assessing the content of learners' compositions is a common practice in second language (L2) writing assessment. However, the construct definition of content in L2 writing assessment potentially underrepresents the target competence in content and language integrated learning (CLIL), which aims to foster not only L2 proficiency but also critical…

Descriptors: Language Tests, Content and Language Integrated Learning, Writing Evaluation, Writing Tests

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

The Relationship between Written Discourse Features and Integrated Listening-to-Write Scores for Adolescent English Language Learners

Peer reviewed

Direct link

Ray J. T. Liao; Renka Ohta; Kwangmin Lee – Language Testing, 2024

As integrated writing tasks in large-scale and classroom-based writing assessments have risen in popularity, research studies have increasingly concentrated on providing validity evidence. Given the fact that most of these studies focus on adult second language learners rather than younger ones, this study examined the relationship between written…

Descriptors: Writing (Composition), Writing Evaluation, English Language Learners, Discourse Analysis

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Application of an Automated Essay Scoring Engine to English Writing Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Chan, Kinnie Kin Yee; Bond, Trevor; Yan, Zi – Language Testing, 2023

We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into…

Descriptors: Computer Assisted Testing, Essays, Scoring, Scores

Exploring Potential Biases in GPT-4o's Ratings of English Language Learners' Essays

Peer reviewed

Direct link

Taichi Yamashita – Language Testing, 2025

With the rapid development of generative artificial intelligence (AI) frameworks (e.g., the generative pre-trained transformer [GPT]), a growing number of researchers have started to explore its potential as an automated essay scoring (AES) system. While previous studies have investigated the alignment between human ratings and GPT ratings, few…

Descriptors: Artificial Intelligence, English (Second Language), Second Language Learning, Second Language Instruction

Automated Scoring of Junior and Senior High Essays Using Coh-Metrix Features: Implications for Large-Scale Language Testing

Peer reviewed

Direct link

Latifi, Syed; Gierl, Mark – Language Testing, 2021

An automated essay scoring (AES) program is a software system that uses techniques from corpus and computational linguistics and machine learning to grade essays. In this study, we aimed to describe and evaluate particular language features of Coh-Metrix for a novel AES program that would score junior and senior high school students' essays from…

Descriptors: Writing Evaluation, Computer Assisted Testing, Scoring, Essays

Towards More Valid Scoring Criteria for Integrated Reading-Writing and Listening-Writing Summary Tasks

Peer reviewed

Direct link

Chan, Sathena; May, Lyn – Language Testing, 2023

Despite the increased use of integrated tasks in high-stakes academic writing assessment, research on rating criteria which reflect the unique construct of integrated summary writing skills is comparatively rare. Using a mixed-method approach of expert judgement, text analysis, and statistical analysis, this study examines writing features that…

Descriptors: Scoring, Writing Evaluation, Reading Tests, Listening Skills

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

Developing a Level-Specific Checklist for Assessing EFL Writing

Peer reviewed

Direct link

Lukácsi, Zoltán – Language Testing, 2021

In second language writing assessment, rating scales and scores from human-mediated assessment have been criticized for a number of shortcomings including problems with adequacy, relevance, and reliability (Hamp-Lyons, 1990; McNamara, 1996; Weigle, 2002). In its testing practice, Euroexam International also detected that the rating scales for…

Descriptors: Test Construction, Test Validity, Test Items, Check Lists

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Barkaoui, Khaled	2
Crossley, Scott A.	2
Cumming, Alister	2
Gebril, Atta	2
Jarvis, Scott	2
Knoch, Ute	2
Kuiken, Folkert	2
Lim, Gad S.	2
Lu, Xiaofei	2
McNamara, Danielle S.	2
Plakans, Lia	2
Salsbury, Tom	2
Vedder, Ineke	2
Ann Tai Choe	1
Arkoudis, Sophie	1
Attali, Yigal	1
Bachman, Lyle F.	1
Bae, Jungok	1
Barkhuizen, Gary	1
Bilki, Zeynep	1
Bond, Trevor	1
Bouwer, Renske	1
Béguin, Anton	1
Chan, Kinnie Kin Yee	1
More ▼