ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	15
Since 2017 (last 10 years)	35
Since 2007 (last 20 years)	68

Descriptor

Language Tests	87
Test Reliability	59
Second Language Learning	51
English (Second Language)	37
Test Validity	34
Foreign Countries	32
Language Proficiency	30
Scores	26
Interrater Reliability	23
Comparative Analysis	16
Correlation	15
Scoring	14
Test Construction	14
Item Response Theory	13
Testing	13
Evaluators	12
Rating Scales	12
Oral Language	11
Listening Comprehension Tests	9
Second Language Instruction	9
Test Format	9
Test Items	9
Computer Assisted Testing	8
Language Skills	8
Psychometrics	8
More ▼

Source

Language Testing

Publication Type

Journal Articles	87
Reports - Research	55
Reports - Evaluative	19
Reports - Descriptive	9
Information Analyses	6
Opinion Papers	3
Tests/Questionnaires	2
Speeches/Meeting Papers	1

Education Level

Higher Education	14
Postsecondary Education	9
Secondary Education	7
Elementary Education	5
Adult Education	1
Early Childhood Education	1
Elementary Secondary Education	1
Grade 12	1
Grade 6	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Kindergarten	1
Middle Schools	1
Primary Education	1
More ▼

Audience

Location

China	6
Germany	4
Netherlands	4
South Korea	3
Australia	2
Canada	2
France	2
Hong Kong	2
Japan	2
Taiwan	2
United Kingdom	2
Arizona	1
Bulgaria	1
China (Guangzhou)	1
Denmark	1
Finland	1
Hawaii	1
India	1
Iran	1
Italy	1
Kenya	1
Pennsylvania (Philadelphia)	1
Poland	1
Russia	1
Sweden	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	10
ACTFL Oral Proficiency…	1
English Proficiency Test	1
International English…	1
Peabody Picture Vocabulary…	1
Test of Written English	1

What Works Clearinghouse Rating

Showing 1 to 15 of 87 results Save | Export

Comparison of Traditional Machine Learning and Neural Network Approaches for Automated Scoring of Second Language English Essays

Peer reviewed

Direct link

Erik Voss – Language Testing, 2025

An increasing number of language testing companies are developing and deploying deep learning-based automated essay scoring systems (AES) to replace traditional approaches that rely on handcrafted feature extraction. However, there is hesitation to accept neural network approaches to automated essay scoring because the features are automatically…

Descriptors: Artificial Intelligence, Automation, Scoring, English (Second Language)

Communal Factors in Rater Severity and Consistency over Time in High-Stakes Oral Assessment

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…

Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory

A Shortened Test Is Feasible: Evaluating a Large-Scale Multistage Adaptive English Language Assessment

Peer reviewed

Direct link

Shangchao Min; Kyoungwon Bishop – Language Testing, 2024

This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1-12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST…

Descriptors: Adaptive Testing, Test Construction, Language Tests, English Language Learners

A Meta-Analysis of Self-Assessment and Language Performance in Language Testing and Assessment

Peer reviewed

Direct link

Li, Minzi; Zhang, Xian – Language Testing, 2021

This meta-analysis explores the correlation between self-assessment (SA) and language performance. Sixty-seven studies with 97 independent samples involving more than 68,500 participants were included in our analysis. It was found that the overall correlation between SA and language performance was 0.466 (p < 0.01). Moderator analysis was…

Descriptors: Meta Analysis, Self Evaluation (Individuals), Likert Scales, Research Reports

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Test Review: Computer-Based English Listening and Speaking Test (CELST) of National Matriculation English Test (NMET) Guangdong Version in China

Peer reviewed

Direct link

Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025

This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…

Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests

Developing Internet-Based "Tests of Aptitude for Language Learning (TALL)": An Open Research Endeavour

Peer reviewed

Direct link

Junlan Pan; Emma Marsden – Language Testing, 2024

"Tests of Aptitude for Language Learning" (TALL) is an openly accessible internet-based battery to measure the multifaceted construct of foreign language aptitude, using language domain-specific instruments and L1-sensitive instructions and stimuli. This brief report introduces the components of this theory-informed battery and…

Descriptors: Language Tests, Aptitude Tests, Second Language Learning, Test Construction

Revisiting Rating Scale Development for Rater-Mediated Language Performance Assessments: Modelling Construct and Contextual Choices Made by Scale Developers

Peer reviewed

Direct link

Knoch, Ute; Deygers, Bart; Khamboonruang, Apichat – Language Testing, 2021

Rating scale development in the field of language assessment is often considered in dichotomous ways: It is assumed to be guided either by expert intuition or by drawing on performance data. Even though quite a few authors have argued that rating scale development is rarely so easily classifiable, this dyadic view has dominated language testing…

Descriptors: Rating Scales, Test Construction, Language Tests, Test Use

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

The Typology of Second Language Listening Constructs: A Systematic Review

Peer reviewed

Direct link

Aryadoust, Vahid; Luo, Lan – Language Testing, 2023

This study reviewed conceptualizations and operationalizations of second language (L2) listening constructs. A total of 157 peer-reviewed papers published in 19 journals in applied linguistics were coded for (1) publication year, author, source title, location, language, and reliability and (2) listening subskills, cognitive processes, attributes,…

Descriptors: Test Format, Listening Comprehension Tests, Second Language Learning, Second Language Instruction

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

Korean Syntactic Complexity Analyzer (KOSCA): An NLP Application for the Analysis of Syntactic Complexity in Second Language Production

Peer reviewed

Direct link

Haerim Hwang; Hyunwoo Kim – Language Testing, 2024

Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides…

Descriptors: Korean, Natural Language Processing, Syntax, Computer Graphics

Setting Standards for a Diagnostic Test of Aviation English for Student Pilots

Peer reviewed

Direct link

Maria Treadaway; John Read – Language Testing, 2024

Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight…

Descriptors: Standard Setting, Diagnostic Tests, High Stakes Tests, English for Special Purposes

The Use of Generalizability Theory in Investigating the Score Dependability of Classroom-Based L2 Reading Assessment

Peer reviewed

Direct link

Liao, Ray J. T. – Language Testing, 2023

Among the variety of selected response formats used in L2 reading assessment, multiple-choice (MC) is the most commonly adopted, primarily due to its efficiency and objectiveness. Given the impact of assessment results on teaching and learning, it is necessary to investigate the degree to which the MC format reliably measures learners' L2 reading…

Descriptors: Reading Tests, Language Tests, Second Language Learning, Second Language Instruction

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Knoch, Ute	3
Alderson, J. Charles	2
Aryadoust, Vahid	2
Brown, James Dean	2
Chapelle, Carol A.	2
Deygers, Bart	2
Haug, Tobias	2
Kunnan, Antony John	2
Lee, Yong-Won	2
Stansfield, Charles W.	2
Winke, Paula	2
Allan, Alistair	1
Ann Tai Choe	1
Audeoud, Mireille	1
August, Diane	1
Batty, Aaron Olaf	1
Bosker, Hans Rutger	1
Bridgeman, Brent	1
Brown, Annie	1
Brunfaut, Tineke	1
Cai, Yuyang	1
Carey, Michael D.	1
Carlo, Maria	1
Chan, Stephanie W. Y.	1
More ▼