ERIC - Search Results

Publication Date

In 2025	3
Since 2024	8
Since 2021 (last 5 years)	18
Since 2016 (last 10 years)	29
Since 2006 (last 20 years)	62

Source

Language Testing

Publication Type

Journal Articles	62
Reports - Research	45
Reports - Evaluative	12
Tests/Questionnaires	5
Reports - Descriptive	4
Information Analyses	2
Opinion Papers	2

Education Level

Higher Education	24
Postsecondary Education	14
Elementary Education	3
Secondary Education	2
Adult Education	1
Elementary Secondary Education	1
Grade 10	1
Grade 5	1
Grade 6	1
Grade 9	1
Intermediate Grades	1
Middle Schools	1
More ▼

Audience

Location

Japan	4
Canada	3
China	3
Australia	2
Michigan	2
Austria	1
Canada (Montreal)	1
Chile	1
Colombia	1
Croatia	1
Finland	1
Hong Kong	1
Illinois (Urbana)	1
Iran	1
Kenya	1
Malaysia	1
Sweden	1
Taiwan	1
Turkey	1
United States	1
Vietnam	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	8
International English…	3

What Works Clearinghouse Rating

Showing 1 to 15 of 62 results Save | Export

Evaluating Methodological Enhancements to the Yes/No Angoff Standard-Setting Method in Language Proficiency Assessment

Peer reviewed

Direct link

Tia M. Fechter; Heeyeon Yoon – Language Testing, 2024

This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent…

Descriptors: Standard Setting, Language Proficiency, Language Tests, Evaluation Methods

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Interpreting Testing and Assessment: A State-of-the-Art Review

Peer reviewed

Direct link

Han, Chao – Language Testing, 2022

Over the past decade, testing and assessing spoken-language interpreting has garnered an increasing amount of attention from stakeholders in interpreter education, professional certification, and interpreting research. This is because in these fields assessment results provide a critical evidential basis for high-stakes decisions, such as the…

Descriptors: Translation, Language Tests, Testing, Evaluation Methods

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

An Automatized Semantic Analysis of Two Large-Scale Listening Tests: A Corpus-Based Study

Peer reviewed

Direct link

Yufan Zhao; Vahid Aryadoust – Language Testing, 2025

This study examined the semantic features of the simulated mini-lectures in the listening sections of the International English Language Testing System (IELTS) and the Test of English as a Foreign Language (TOEFL) based on automatized semantic analysis to explore the content validity of the two tests. Two study corpora were utilized, the IELTS…

Descriptors: Semantics, Computational Linguistics, Academic Language, Second Language Learning

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Reflections on the Past and Future of Language Testing and Assessment: An Emerging Scholar's Perspective

Peer reviewed

Direct link

Burton, J. Dylan – Language Testing, 2023

In its 40th year, "Language Testing" journal has served as the flagship journal for scholars, researchers, and practitioners in the field of language testing and assessment. This viewpoint piece, written from the perspective of an emerging scholar, discusses two possible future trends based on evidence going back to the very first issue…

Descriptors: Language Tests, Testing, Futures (of Society), Periodicals

The Typology of Second Language Listening Constructs: A Systematic Review

Peer reviewed

Direct link

Aryadoust, Vahid; Luo, Lan – Language Testing, 2023

This study reviewed conceptualizations and operationalizations of second language (L2) listening constructs. A total of 157 peer-reviewed papers published in 19 journals in applied linguistics were coded for (1) publication year, author, source title, location, language, and reliability and (2) listening subskills, cognitive processes, attributes,…

Descriptors: Test Format, Listening Comprehension Tests, Second Language Learning, Second Language Instruction

Authenticity of Academic Lecture Passages in High-Stakes Tests: A Temporal Fluency Perspective

Peer reviewed

Direct link

Hitoshi Nishizawa – Language Testing, 2024

Corpus-based studies have offered the domain definition inference for test developers. Yet, corpus-based studies on temporal fluency measures (e.g., speech rate) have been limited, especially in the context of academic lecture settings. This made it difficult for test developers to sample representative fluency features to create authentic…

Descriptors: High Stakes Tests, Language Tests, Second Language Learning, Computer Assisted Testing

But Who Trains the Language Teacher Educator Who Trains the Language Teacher? An Empirical Investigation of Chilean EFL Teacher Educators' Language Assessment Literacy

Peer reviewed

Direct link

Villa Larenas, Salomé; Brunfaut, Tineke – Language Testing, 2023

Research has shown that language teachers typically feel underprepared for assessment aspects of their job. One reason may relate to how teacher education programmes prepare future teachers in this area. Research insights into how and to what extent teacher educators train future language teachers in language assessment matters are scarce,…

Descriptors: Foreign Countries, Second Language Instruction, Language Teachers, Teacher Educators

Test Design and Validity Evidence of Interactive Speaking Assessment in the Era of Emerging Technologies

Peer reviewed

Direct link

Jung Youn, Soo – Language Testing, 2023

As access to smartphones and emerging technologies has become ubiquitous in our daily lives and in language learning, technology-mediated social interaction has become common in teaching and assessing L2 speaking. The changing ecology of L2 spoken interaction provides language educators and testers with opportunities for renewed test design and…

Descriptors: Test Construction, Test Validity, Second Language Learning, Telecommunications

Towards More Valid Scoring Criteria for Integrated Reading-Writing and Listening-Writing Summary Tasks

Peer reviewed

Direct link

Chan, Sathena; May, Lyn – Language Testing, 2023

Despite the increased use of integrated tasks in high-stakes academic writing assessment, research on rating criteria which reflect the unique construct of integrated summary writing skills is comparatively rare. Using a mixed-method approach of expert judgement, text analysis, and statistical analysis, this study examines writing features that…

Descriptors: Scoring, Writing Evaluation, Reading Tests, Listening Skills

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment

Peer reviewed

Direct link

Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…

Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Han, Chao	2
Jang, Eunice Eunhee	2
Kim, Youn-Hee	2
McNamara, Tim	2
Alvarez, Marta E.	1
Aryadoust, Vahid	1
Bax, Stephen	1
Brunfaut, Tineke	1
Burton, J. Dylan	1
Cai, Hongwen	1
Can Daskin, Nilüfer	1
Chan, Sathena	1
Cheng, Liying	1
Cho, Yeonsuk	1
Chodorow, Martin	1
Choi, Hyeran	1
Cumming, Alister	1
Dastjerdi, Hossein Vahid	1
Dunlop, Maggie	1
Duyen Thi Bich Nguyen	1
Ferne, Tracy	1
Gamon, Michael	1
Gan, Zhengdong	1
Hatipoglu, Çiler	1
Heeyeon Yoon	1
More ▼

Second Language Learning	43
Language Tests	41
English (Second Language)	39
Evaluation Methods	31
Foreign Countries	27
Second Language Instruction	19
Teaching Methods	19
Evaluators	15
Comparative Analysis	12
Language Proficiency	12
Oral Language	12
Language Teachers	9
Mixed Methods Research	9
Scoring	9
Student Evaluation	9
Validity	9
Testing	8
Writing Evaluation	8
College Students	7
Interrater Reliability	7
Scores	7
Statistical Analysis	7
Test Items	7
Computer Assisted Testing	6
Reading Tests	6
More ▼