ERIC - Search Results

Publication Date

In 2025	3
Since 2024	11
Since 2021 (last 5 years)	24
Since 2016 (last 10 years)	45
Since 2006 (last 20 years)	65

Descriptor

Language Tests	54
Second Language Learning	44
Test Reliability	43
English (Second Language)	36
Foreign Countries	31
Interrater Reliability	30
Language Proficiency	23
Test Validity	23
Evaluators	20
Scores	20
Comparative Analysis	19
Correlation	18
Item Response Theory	17
Scoring	17
Test Construction	14
Reliability	13
Testing	12
Writing Evaluation	11
Rating Scales	10
Test Items	10
Oral Language	9
Second Language Instruction	9
Statistical Analysis	9
Writing Tests	9
Factor Analysis	8
More ▼

Source

Language Testing

Publication Type

Journal Articles	82
Reports - Research	82
Tests/Questionnaires	5
Information Analyses	1

Education Level

Higher Education	16
Postsecondary Education	14
Secondary Education	8
Elementary Education	5
Elementary Secondary Education	3
High Schools	2
Junior High Schools	2
Middle Schools	2
Adult Education	1
Early Childhood Education	1
Grade 7	1
Kindergarten	1
Primary Education	1
More ▼

Audience

Location

Netherlands	6
China	5
Finland	3
Germany	3
South Korea	3
United Kingdom	2
Australia	1
Austria	1
Bulgaria	1
China (Guangzhou)	1
Colombia	1
Denmark	1
Europe	1
Georgia	1
Hawaii	1
Hong Kong	1
Illinois	1
Illinois (Urbana)	1
India	1
Indiana	1
Iran	1
Japan	1
Kenya	1
Ohio	1
Pennsylvania (Philadelphia)	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	5
Peabody Picture Vocabulary…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 82 results Save | Export

Communal Factors in Rater Severity and Consistency over Time in High-Stakes Oral Assessment

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…

Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory

A Shortened Test Is Feasible: Evaluating a Large-Scale Multistage Adaptive English Language Assessment

Peer reviewed

Direct link

Shangchao Min; Kyoungwon Bishop – Language Testing, 2024

This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1-12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST…

Descriptors: Adaptive Testing, Test Construction, Language Tests, English Language Learners

A Meta-Analysis of Self-Assessment and Language Performance in Language Testing and Assessment

Peer reviewed

Direct link

Li, Minzi; Zhang, Xian – Language Testing, 2021

This meta-analysis explores the correlation between self-assessment (SA) and language performance. Sixty-seven studies with 97 independent samples involving more than 68,500 participants were included in our analysis. It was found that the overall correlation between SA and language performance was 0.466 (p < 0.01). Moderator analysis was…

Descriptors: Meta Analysis, Self Evaluation (Individuals), Likert Scales, Research Reports

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Developing Internet-Based "Tests of Aptitude for Language Learning (TALL)": An Open Research Endeavour

Peer reviewed

Direct link

Junlan Pan; Emma Marsden – Language Testing, 2024

"Tests of Aptitude for Language Learning" (TALL) is an openly accessible internet-based battery to measure the multifaceted construct of foreign language aptitude, using language domain-specific instruments and L1-sensitive instructions and stimuli. This brief report introduces the components of this theory-informed battery and…

Descriptors: Language Tests, Aptitude Tests, Second Language Learning, Test Construction

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Developing a Local Academic English Listening Test Using Authentic Unscripted Audio-Visual Texts

Peer reviewed

Direct link

Park, Yena; Lee, Senyung; Shin, Sun-Young – Language Testing, 2022

Despite consistent calls for authentic stimuli in listening tests for better construct representation, unscripted texts have been rarely adopted in high-stakes listening tests due to perceived inefficiency. This study details how a local academic listening test was developed using authentic unscripted audio-visual texts from the local target…

Descriptors: Listening Comprehension Tests, English for Academic Purposes, Test Construction, Foreign Students

Korean Syntactic Complexity Analyzer (KOSCA): An NLP Application for the Analysis of Syntactic Complexity in Second Language Production

Peer reviewed

Direct link

Haerim Hwang; Hyunwoo Kim – Language Testing, 2024

Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides…

Descriptors: Korean, Natural Language Processing, Syntax, Computer Graphics

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Haug, Tobias	2
Iasonas Lamprianou	2
Kunnan, Antony John	2
Lin, Chih-Kai	2
Reeta Neittaanmäki	2
Schoonen, Rob	2
Shin, Sun-Young	2
Wind, Stefanie A.	2
Yan, Xun	2
de Jong, Nivja H.	2
Alanen, Riikka	1
Alderson, J. Charles	1
Allan, Alistair	1
Amezcua, Angelica	1
Ann Tai Choe	1
Aryadoust, Vahid	1
Attali, Yigal	1
Audeoud, Mireille	1
August, Diane	1
Batty, Aaron Olaf	1
Beaudrie, Sara	1
Bosker, Hans Rutger	1
Bridgeman, Brent	1
Brown, Annie	1
More ▼