ERIC - Search Results

Publication Date

In 2026	0
Since 2025	6
Since 2022 (last 5 years)	21
Since 2017 (last 10 years)	52
Since 2007 (last 20 years)	93

Descriptor

Language Tests	87
Test Reliability	71
Second Language Learning	69
English (Second Language)	54
Foreign Countries	46
Test Validity	42
Interrater Reliability	39
Language Proficiency	37
Scores	32
Evaluators	22
Comparative Analysis	21
Correlation	20
Scoring	20
Test Construction	19
Item Response Theory	18
Testing	18
Rating Scales	16
Reliability	15
Writing Evaluation	15
Oral Language	13
Writing Tests	12
High Stakes Tests	11
Reading Comprehension	11
Second Language Instruction	11
Secondary School Students	11
More ▼

Source

Language Testing

120

Publication Type

Journal Articles	120
Reports - Research	84
Reports - Evaluative	23
Reports - Descriptive	9
Information Analyses	6
Tests/Questionnaires	5
Opinion Papers	3
Speeches/Meeting Papers	1

Education Level

Higher Education	23
Postsecondary Education	16
Secondary Education	12
Elementary Education	6
Elementary Secondary Education	4
Junior High Schools	3
Middle Schools	3
High Schools	2
Adult Education	1
Early Childhood Education	1
Grade 12	1
Grade 6	1
Grade 7	1
Intermediate Grades	1
Kindergarten	1
Primary Education	1
More ▼

Audience

Location

China	7
Netherlands	7
Finland	4
Germany	4
Australia	3
Japan	3
South Korea	3
Canada	2
France	2
Hong Kong	2
Taiwan	2
United Kingdom	2
Arizona	1
Austria	1
Bulgaria	1
China (Guangzhou)	1
Colombia	1
Denmark	1
Europe	1
Georgia	1
Hawaii	1
Illinois	1
Illinois (Urbana)	1
India	1
Indiana	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	10
ACTFL Oral Proficiency…	1
English Proficiency Test	1
Graduate Record Examinations	1
International English…	1
Peabody Picture Vocabulary…	1
Test of Written English	1

What Works Clearinghouse Rating

Language Testing X

Showing 16 to 30 of 120 results Save | Export

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Developing a Local Academic English Listening Test Using Authentic Unscripted Audio-Visual Texts

Peer reviewed

Direct link

Park, Yena; Lee, Senyung; Shin, Sun-Young – Language Testing, 2022

Despite consistent calls for authentic stimuli in listening tests for better construct representation, unscripted texts have been rarely adopted in high-stakes listening tests due to perceived inefficiency. This study details how a local academic listening test was developed using authentic unscripted audio-visual texts from the local target…

Descriptors: Listening Comprehension Tests, English for Academic Purposes, Test Construction, Foreign Students

Korean Syntactic Complexity Analyzer (KOSCA): An NLP Application for the Analysis of Syntactic Complexity in Second Language Production

Peer reviewed

Direct link

Haerim Hwang; Hyunwoo Kim – Language Testing, 2024

Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides…

Descriptors: Korean, Natural Language Processing, Syntax, Computer Graphics

A Nonparametric Procedure for Exploring Differences in Rating Quality across Test-Taker Subgroups in Rater-Mediated Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2019

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…

Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests

Automated Scoring of Junior and Senior High Essays Using Coh-Metrix Features: Implications for Large-Scale Language Testing

Peer reviewed

Direct link

Latifi, Syed; Gierl, Mark – Language Testing, 2021

An automated essay scoring (AES) program is a software system that uses techniques from corpus and computational linguistics and machine learning to grade essays. In this study, we aimed to describe and evaluate particular language features of Coh-Metrix for a novel AES program that would score junior and senior high school students' essays from…

Descriptors: Writing Evaluation, Computer Assisted Testing, Scoring, Essays

Setting Standards for a Diagnostic Test of Aviation English for Student Pilots

Peer reviewed

Direct link

Maria Treadaway; John Read – Language Testing, 2024

Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight…

Descriptors: Standard Setting, Diagnostic Tests, High Stakes Tests, English for Special Purposes

The Use of Generalizability Theory in Investigating the Score Dependability of Classroom-Based L2 Reading Assessment

Peer reviewed

Direct link

Liao, Ray J. T. – Language Testing, 2023

Among the variety of selected response formats used in L2 reading assessment, multiple-choice (MC) is the most commonly adopted, primarily due to its efficiency and objectiveness. Given the impact of assessment results on teaching and learning, it is necessary to investigate the degree to which the MC format reliably measures learners' L2 reading…

Descriptors: Reading Tests, Language Tests, Second Language Learning, Second Language Instruction

A Comprehensive Review of Rasch Measurement in Language Assessment: Recommendations and Guidelines for Research

Peer reviewed

Direct link

Aryadoust, Vahid; Ng, Li Ying; Sayama, Hiroki – Language Testing, 2021

Over the past decades, the application of Rasch measurement in language assessment has gradually increased. In the present study, we coded 215 papers using Rasch measurement published in 21 applied linguistics journals for multiple features. We found that seven Rasch models and 23 software packages were adopted in these papers, with many-facet…

Descriptors: Language Tests, Testing, Test Items, Network Analysis

Validation of Rating Processes within an Argument-Based Framework

Peer reviewed

Direct link

Knoch, Ute; Chapelle, Carol A. – Language Testing, 2018

Argument-based validation requires test developers and researchers to specify what is entailed in test interpretation and use. Doing so has been shown to yield advantages (Chapelle, Enright, & Jamieson, 2010), but it also requires an analysis of how the concerns of language testers can be conceptualized in the terms used to construct a…

Descriptors: Test Validity, Language Tests, Evaluation Research, Rating Scales

Measuring Bilingual Language Dominance: An Examination of the Reliability of the Bilingual Language Profile

Peer reviewed

Direct link

Olson, Daniel J. – Language Testing, 2023

Measuring language dominance, broadly defined as the relative strength of each of a bilingual's two languages, remains a crucial methodological issue in bilingualism research. While various methods have been proposed, the Bilingual Language Profile (BLP) has been one of the most widely used tools for measuring language dominance. While previous…

Descriptors: Bilingualism, Language Dominance, Native Language, Second Language Learning

Adaptation of the British Sign Language Receptive Skills Test into Polish Sign Language

Peer reviewed

Direct link

Kotowicz, Justyna; Woll, Bencie; Herman, Rosalind – Language Testing, 2021

The evaluation of sign language proficiency needs to be based on measures with well-established psychometric proprieties. To date, no valid and reliable test is available to assess Polish Sign Language ("Polski Jezyk Migowy," PJM) skills in deaf children. Hence, our aim with this study was to adapt the British Sign Language Receptive…

Descriptors: Language Tests, Receptive Language, Sign Language, Language Proficiency

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Measuring L2 Speakers' Interactional Ability Using Interactive Speech Tasks

Peer reviewed

Direct link

van Batenburg, Eline S. L.; Oostdam, Ron J.; van Gelderen, Amos J. S.; de Jong, Nivja H. – Language Testing, 2018

This article explores ways to assess interactional performance, and reports on the use of a test format that standardizes the interlocutor's linguistic and interactional contributions to the exchange. It describes the construction and administration of six scripted speech tasks (instruction, advice, and sales tasks) with pre-vocational learners (n…

Descriptors: Second Language Learning, Speech Tests, Interaction, Test Reliability

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Knoch, Ute	4
Alderson, J. Charles	2
Aryadoust, Vahid	2
Attali, Yigal	2
Brown, James Dean	2
Chapelle, Carol A.	2
Deygers, Bart	2
Elder, Catherine	2
Haug, Tobias	2
Iasonas Lamprianou	2
Jarvis, Scott	2
Kunnan, Antony John	2
Lee, Yong-Won	2
Lin, Chih-Kai	2
Reeta Neittaanmäki	2
Schoonen, Rob	2
Shin, Sun-Young	2
Stansfield, Charles W.	2
Wind, Stefanie A.	2
Winke, Paula	2
Yan, Xun	2
de Jong, Nivja H.	2
Alanen, Riikka	1
Allan, Alistair	1
More ▼