Publication Date
In 2025 | 2 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 16 |
Since 2006 (last 20 years) | 31 |
Descriptor
Source
Language Testing | 56 |
Author
McNamara, Tim | 3 |
Brindley, Geoff | 2 |
Cheng, Liying | 2 |
Han, Chao | 2 |
Rea-Dickins, Pauline | 2 |
Arkoudis, Sophie | 1 |
Brunfaut, Tineke | 1 |
Burton, J. Dylan | 1 |
Byrnes, Heidi | 1 |
Cai, Hongwen | 1 |
Can Daskin, Nilüfer | 1 |
More ▼ |
Publication Type
Journal Articles | 56 |
Reports - Research | 37 |
Reports - Descriptive | 8 |
Reports - Evaluative | 8 |
Opinion Papers | 4 |
Information Analyses | 2 |
Reports - General | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 11 |
Postsecondary Education | 5 |
Secondary Education | 2 |
Elementary Secondary Education | 1 |
Grade 10 | 1 |
Audience
Location
Australia | 6 |
Canada | 3 |
China | 3 |
Japan | 3 |
Austria | 2 |
Hong Kong | 2 |
United Kingdom (England) | 2 |
Chile | 1 |
Europe | 1 |
Illinois (Urbana) | 1 |
Iran | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 2 |
Program for International… | 1 |
What Works Clearinghouse Rating
Tia M. Fechter; Heeyeon Yoon – Language Testing, 2024
This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent…
Descriptors: Standard Setting, Language Proficiency, Language Tests, Evaluation Methods
Ping-Lin Chuang – Language Testing, 2025
This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…
Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources
Han, Chao – Language Testing, 2022
Over the past decade, testing and assessing spoken-language interpreting has garnered an increasing amount of attention from stakeholders in interpreter education, professional certification, and interpreting research. This is because in these fields assessment results provide a critical evidential basis for high-stakes decisions, such as the…
Descriptors: Translation, Language Tests, Testing, Evaluation Methods
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Burton, J. Dylan – Language Testing, 2023
In its 40th year, "Language Testing" journal has served as the flagship journal for scholars, researchers, and practitioners in the field of language testing and assessment. This viewpoint piece, written from the perspective of an emerging scholar, discusses two possible future trends based on evidence going back to the very first issue…
Descriptors: Language Tests, Testing, Futures (of Society), Periodicals
Villa Larenas, Salomé; Brunfaut, Tineke – Language Testing, 2023
Research has shown that language teachers typically feel underprepared for assessment aspects of their job. One reason may relate to how teacher education programmes prepare future teachers in this area. Research insights into how and to what extent teacher educators train future language teachers in language assessment matters are scarce,…
Descriptors: Foreign Countries, Second Language Instruction, Language Teachers, Teacher Educators
Jung Youn, Soo – Language Testing, 2023
As access to smartphones and emerging technologies has become ubiquitous in our daily lives and in language learning, technology-mediated social interaction has become common in teaching and assessing L2 speaking. The changing ecology of L2 spoken interaction provides language educators and testers with opportunities for renewed test design and…
Descriptors: Test Construction, Test Validity, Second Language Learning, Telecommunications
Chan, Sathena; May, Lyn – Language Testing, 2023
Despite the increased use of integrated tasks in high-stakes academic writing assessment, research on rating criteria which reflect the unique construct of integrated summary writing skills is comparatively rare. Using a mixed-method approach of expert judgement, text analysis, and statistical analysis, this study examines writing features that…
Descriptors: Scoring, Writing Evaluation, Reading Tests, Listening Skills
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Han, Chao; Xiao, Xiaoyan – Language Testing, 2022
The quality of sign language interpreting (SLI) is a gripping construct among practitioners, educators and researchers, calling for reliable and valid assessment. There has been a diverse array of methods in the extant literature to measure SLI quality, ranging from traditional error analysis to recent rubric scoring. In this study, we want to…
Descriptors: Comparative Analysis, Sign Language, Deaf Interpreting, Evaluators
Can Daskin, Nilüfer; Hatipoglu, Çiler – Language Testing, 2019
In this study we are concerned with the informal dimension of formative assessment (FA) in an L2 classroom. We examine those instances that are embedded into everyday learning activities and that emerge in and through classroom interaction contingently, continuously and flexibly. Drawing on the methodological underpinnings of Conversation Analysis…
Descriptors: Formative Evaluation, Classroom Communication, Second Language Learning, Evaluation Methods
Xi, Xiaoming – Language Testing, 2017
In recent years, continuing advances in technology have increased the capacity to automate the extraction of a range of linguistic features of texts and thus have provided the impetus for the substantial growth of corpus linguistics. While corpus linguistic tools and methods have been used extensively in second language learning research, they…
Descriptors: Computational Linguistics, Second Language Learning, Language Tests, Evaluation Methods
Tajeddin, Zia; Khatib, Mohammad; Mahdavi, Mohsen – Language Testing, 2022
Critical language assessment (CLA) has been addressed in numerous studies. However, the majority of the studies have overlooked the need for a practical framework to measure the CLA dimension of teachers' language assessment literacy (LAL). This gap prompted us to develop and validate a critical language assessment literacy (CLAL) scale to further…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Language Tests
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy