ERIC - Search Results

Publication Date

In 2025	3
Since 2024	8

Source

Language Testing

Author

Duyen Thi Bich Nguyen	1
Heeyeon Yoon	1
Hitoshi Nishizawa	1
Huiying Cai	1
Hung Tan Ha	1
Iasonas Lamprianou	1
John Pill	1
Ping-Lin Chuang	1
Rebecca Sickinger	1
Reeta Neittaanmäki	1
Tia M. Fechter	1
Tim Stoeckel	1
Tineke Brunfaut	1
Vahid Aryadoust	1
Xun Yan	1
Yufan Zhao	1
More ▼

Publication Type

Journal Articles	8
Reports - Research	8

Education Level

Higher Education	3
Postsecondary Education	3
Secondary Education	1

Audience

Location

Austria	1
Finland	1
Illinois (Urbana)	1
Vietnam	1

Laws, Policies, & Programs

Assessments and Surveys

International English…	2
Test of English as a Foreign…	2

What Works Clearinghouse Rating

Showing all 8 results Save | Export

Evaluating Methodological Enhancements to the Yes/No Angoff Standard-Setting Method in Language Proficiency Assessment

Peer reviewed

Direct link

Tia M. Fechter; Heeyeon Yoon – Language Testing, 2024

This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent…

Descriptors: Standard Setting, Language Proficiency, Language Tests, Evaluation Methods

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

An Automatized Semantic Analysis of Two Large-Scale Listening Tests: A Corpus-Based Study

Peer reviewed

Direct link

Yufan Zhao; Vahid Aryadoust – Language Testing, 2025

This study examined the semantic features of the simulated mini-lectures in the listening sections of the International English Language Testing System (IELTS) and the Test of English as a Foreign Language (TOEFL) based on automatized semantic analysis to explore the content validity of the two tests. Two study corpora were utilized, the IELTS…

Descriptors: Semantics, Computational Linguistics, Academic Language, Second Language Learning

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Authenticity of Academic Lecture Passages in High-Stakes Tests: A Temporal Fluency Perspective

Peer reviewed

Direct link

Hitoshi Nishizawa – Language Testing, 2024

Corpus-based studies have offered the domain definition inference for test developers. Yet, corpus-based studies on temporal fluency measures (e.g., speech rate) have been limited, especially in the context of academic lecture settings. This made it difficult for test developers to sample representative fluency features to create authentic…

Descriptors: High Stakes Tests, Language Tests, Second Language Learning, Computer Assisted Testing

What Is the Best Predictor of Word Difficulty? A Case of Data Mining Using Random Forest

Peer reviewed

Direct link

Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Testing, 2024

Word frequency has a long history of being considered the most important predictor of word difficulty and has served as a guideline for several aspects of second language vocabulary teaching, learning, and assessment. However, recent empirical research has challenged the supremacy of frequency as a predictor of word difficulty. Accordingly,…

Descriptors: Word Frequency, Vocabulary Skills, Second Language Learning, Second Language Instruction

Language Tests	5
Second Language Learning	5
English (Second Language)	4
Evaluation Methods	4
Academic Language	3
Comparative Analysis	3
Computational Linguistics	3
Foreign Countries	3
Difficulty Level	2
Evaluation Criteria	2
Evaluators	2
Interrater Reliability	2
Language Teachers	2
Lecture Method	2
Listening Comprehension Tests	2
Reliability	2
Second Language Instruction	2
Writing Evaluation	2
Accuracy	1
Achievement Rating	1
Applied Linguistics	1
Causal Models	1
College Faculty	1
College Students	1
Computer Assisted Testing	1
More ▼