ERIC - Search Results

Publication Date

In 2026	0
Since 2025	6

Source

Language Testing

Author

Erik Voss	1
Esmat Babaii	1
Farshad Effatpanah	1
Hiroaki Yamada	1
Jin Chen	1
John Pill	1
Mona Tabatabaee-Yazdi	1
Ping-Lin Chuang	1
Purya Baghaei	1
Rebecca Sickinger	1
Takenobu Tokunaga	1
Tineke Brunfaut	1
Xiaodong Li	1
Yasuyo Sawaki	1
Ying Xu	1
Yutaka Ishii	1
More ▼

Publication Type

Journal Articles	6
Reports - Research	5
Reports - Descriptive	1

Education Level

Higher Education	2
Postsecondary Education	2
Secondary Education	2
Junior High Schools	1
Middle Schools	1

Audience

Location

Austria	1
China	1
Illinois (Urbana)	1
Iran	1
Japan (Tokyo)	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 6 results Save | Export

Comparison of Traditional Machine Learning and Neural Network Approaches for Automated Scoring of Second Language English Essays

Peer reviewed

Direct link

Erik Voss – Language Testing, 2025

An increasing number of language testing companies are developing and deploying deep learning-based automated essay scoring systems (AES) to replace traditional approaches that rely on handcrafted feature extraction. However, there is hesitation to accept neural network approaches to automated essay scoring because the features are automatically…

Descriptors: Artificial Intelligence, Automation, Scoring, English (Second Language)

Examining the Consistency of Instructor versus Large Language Model Ratings on Summary Content: Toward Checklist-Based Feedback Provision with Second Language Writers

Peer reviewed

Direct link

Yasuyo Sawaki; Yutaka Ishii; Hiroaki Yamada; Takenobu Tokunaga – Language Testing, 2025

This study examined the consistency between instructor ratings of learner-generated summaries and those estimated by a large language model (LLM) on summary content checklist items designed for undergraduate second language (L2) writing instruction in Japan. The effects of the LLM prompt design on the consistency between the two were also explored…

Descriptors: Interrater Reliability, Writing Teachers, College Faculty, Artificial Intelligence

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Test Review: Computer-Based English Listening and Speaking Test (CELST) of National Matriculation English Test (NMET) Guangdong Version in China

Peer reviewed

Direct link

Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025

This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…

Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

English (Second Language)	5
Foreign Countries	4
Interrater Reliability	3
Language Tests	3
Scores	3
Scoring	3
Second Language Learning	3
Artificial Intelligence	2
Comparative Analysis	2
Evaluation Methods	2
High Stakes Tests	2
Test Reliability	2
Undergraduate Students	2
Writing Evaluation	2
Achievement Rating	1
Automation	1
Causal Models	1
Check Lists	1
College Faculty	1
Computer Assisted Testing	1
Decision Making	1
Essay Tests	1
Evaluation Criteria	1
Evaluative Thinking	1
Evaluators	1
More ▼