Publication Date
| In 2026 | 0 |
| Since 2025 | 6 |
Descriptor
Source
| Language Testing | 6 |
Author
| Erik Voss | 1 |
| Esmat Babaii | 1 |
| Farshad Effatpanah | 1 |
| Hiroaki Yamada | 1 |
| Jin Chen | 1 |
| John Pill | 1 |
| Mona Tabatabaee-Yazdi | 1 |
| Ping-Lin Chuang | 1 |
| Purya Baghaei | 1 |
| Rebecca Sickinger | 1 |
| Takenobu Tokunaga | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 6 |
| Reports - Research | 5 |
| Reports - Descriptive | 1 |
Education Level
| Higher Education | 2 |
| Postsecondary Education | 2 |
| Secondary Education | 2 |
| Junior High Schools | 1 |
| Middle Schools | 1 |
Audience
Location
| Austria | 1 |
| China | 1 |
| Illinois (Urbana) | 1 |
| Iran | 1 |
| Japan (Tokyo) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Erik Voss – Language Testing, 2025
An increasing number of language testing companies are developing and deploying deep learning-based automated essay scoring systems (AES) to replace traditional approaches that rely on handcrafted feature extraction. However, there is hesitation to accept neural network approaches to automated essay scoring because the features are automatically…
Descriptors: Artificial Intelligence, Automation, Scoring, English (Second Language)
Yasuyo Sawaki; Yutaka Ishii; Hiroaki Yamada; Takenobu Tokunaga – Language Testing, 2025
This study examined the consistency between instructor ratings of learner-generated summaries and those estimated by a large language model (LLM) on summary content checklist items designed for undergraduate second language (L2) writing instruction in Japan. The effects of the LLM prompt design on the consistency between the two were also explored…
Descriptors: Interrater Reliability, Writing Teachers, College Faculty, Artificial Intelligence
Ping-Lin Chuang – Language Testing, 2025
This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…
Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources
Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025
This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…
Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Peer reviewed
Direct link
