ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	8

Source

Language Testing

Author

Papageorgiou, Spiros	2
Bachman, Lyle F.	1
Boldt, Robert F.	1
Brown, James Dean	1
Carr, Nathan T.	1
Choi, Ikkyu	1
Choi, Inn-Chull	1
Crossley, Scott	1
Knoch, Ute	1
Kyle, Kristopher	1
Lee, Yong-Won	1
McNamara, Tim	1
Michael D. Carey	1
Powers, Donald	1
Schedl, Mary	1
Stefan Szocs	1
Zhang, Su	1
More ▼

Publication Type

Journal Articles	11
Reports - Research	8
Reports - Evaluative	2
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Elementary Education

Audience

Location

Australia	1
Europe	1
Kenya	1
Netherlands	1
United Kingdom	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	11
International English…	2
Test of English for…	2

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Revisiting Raters' Accent Familiarity in Speaking Tests: Evidence That Presentation Mode Interacts with Accent Familiarity to Variably Affect Comprehensibility Ratings

Peer reviewed

Direct link

Michael D. Carey; Stefan Szocs – Language Testing, 2024

This controlled experimental study investigated the interaction of variables associated with rating the pronunciation component of high-stakes English-language-speaking tests such as IELTS and TOEFL iBT. One hundred experienced raters who were all either familiar or unfamiliar with Brazilian-accented English or Papua New Guinean Tok Pisin-accented…

Descriptors: Dialects, Pronunciation, Suprasegmentals, Familiarity

Facilitating the Interpretation of English Language Proficiency Scores: Combining Scale Anchoring and Test Score Mapping Methodologies

Peer reviewed

Direct link

Powers, Donald; Schedl, Mary; Papageorgiou, Spiros – Language Testing, 2017

The aim of this study was to develop, for the benefit of both test takers and test score users, enhanced "TOEFL ITP"® test score reports that go beyond the simple numerical scores that are currently reported. To do so, we applied traditional scale anchoring (proficiency scaling) to item difficulty data in order to develop performance…

Descriptors: English (Second Language), Second Language Learning, Language Proficiency, Scores

Evaluating Subscore Uses across Multiple Levels: A Case of Reading and Listening Subscores for Young EFL Learners

Peer reviewed

Direct link

Choi, Ikkyu; Papageorgiou, Spiros – Language Testing, 2020

Stakeholders of language tests are often interested in subscores. However, reporting a subscore is not always justified; a subscore should provide reliable and distinct information to be worth reporting. When a subscore is used for decisions across multiple levels (e.g., individual test takers and schools), it needs to be justified for its…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Scores

Assessing Syntactic Sophistication in L2 Writing: A Usage-Based Approach

Peer reviewed

Direct link

Kyle, Kristopher; Crossley, Scott – Language Testing, 2017

Over the past 45 years, the construct of syntactic sophistication has been assessed in L2 writing using what Bulté and Housen (2012) refer to as absolute complexity (Lu, 2011; Ortega, 2003; Wolfe-Quintero, Inagaki, & Kim, 1998). However, it has been argued that making inferences about learners based on absolute complexity indices (e.g., mean…

Descriptors: Syntax, Verbs, Second Language Learning, Word Frequency

The Rasch Wars: The Emergence of Rasch Measurement in Language Testing

Peer reviewed

Direct link

McNamara, Tim; Knoch, Ute – Language Testing, 2012

This paper examines the uptake of Rasch measurement in language testing through a consideration of research published in language testing research journals in the period 1984 to 2009. Following the publication of the first papers on this topic, exploring the potential of the simple Rasch model for the analysis of dichotomous language test data, a…

Descriptors: Language Tests, Testing, English (Second Language), Item Response Theory

The Relative Importance of Persons, Items, Subtests, and Languages to TOEFL Test Variance.

Peer reviewed

Brown, James Dean – Language Testing, 1999

Explored the relative contributions to Test of English as a Foreign Language (TOEFL) score dependability of various numbers of persons, items, subtests, languages, and their various interactions. Sampled 15,000 test takers, 1000 each from 15 different language backgrounds. (Author/VWL)

Descriptors: English (Second Language), Language Tests, Second Language Learning, Student Characteristics

Dependability of Scores for a New ESL Speaking Assessment Consisting of Integrated and Independent Tasks

Peer reviewed

Direct link

Lee, Yong-Won – Language Testing, 2006

A multitask speaking measure consisting of both integrated and independent tasks is expected to be an important component of a new version of the TOEFL test. This study considered two critical issues concerning score dependability of the new speaking measure: How much would the score dependability be impacted by (1) combining scores on different…

Descriptors: Language Tests, Second Language Learning, English (Second Language), Generalizability Theory

Investigating the Relative Effects of Persons, Items, Sections, and Languages on TOEIC Score Dependability

Peer reviewed

Direct link

Zhang, Su – Language Testing, 2006

This study applied generalizability theory to investigate the contributions of persons, items, sections, and language backgrounds to the score dependability of the Test of English for International Communication (TOEIC). I replicated and extended Brown's (1999) study of the Test of English as a Foreign Language (TOEFL), using data from two…

Descriptors: Communication (Thought Transfer), Generalizability Theory, English (Second Language), Scores

An Investigation into the Adequacy of Three IRT Models for Data from Two EFL Reading Tests.

Peer reviewed

Choi, Inn-Chull; Bachman, Lyle F. – Language Testing, 1992

This study is part of a larger one examining the comparability of the First Certificate in English and the Test of English as a Foreign Language. The general assumption of unidimensionality and goodness-of-fit were tested. Findings raise questions about the consequences of rejecting or retaining misfitting items. (60 references) (LB)

Descriptors: Comparative Analysis, English (Second Language), Goodness of Fit, Item Response Theory

Crossvalidation of Item Response Curve Models Using TOEFL Data.

Peer reviewed

Boldt, Robert F. – Language Testing, 1992

The assumption called PIRC (proportional item response curve) was tested in which PIRC was used to predict item scores of selected examinees on selected items. Findings show approximate accuracies of prediction for PIRC, the three-parameter logist model, and a modified Rasch model. (12 references) (Author/LB)

Descriptors: Comparative Analysis, English (Second Language), Factor Analysis, Item Response Theory

The Factor Structure of Test Task Characteristics and Examinee Performance

Peer reviewed

Direct link

Carr, Nathan T. – Language Testing, 2006

The present study focuses on the task characteristics of reading passages and key sentences in a test of second language reading. Using a new methodological approach to describe variation in test task characteristics and explore how differences in these characteristics might relate to examinee performance, it posed the two following research…

Descriptors: English for Academic Purposes, Sentences, Reading Comprehension, Factor Analysis

English (Second Language)	11
Language Tests	10
Second Language Learning	8
Item Response Theory	6
Scores	5
Test Items	5
Comparative Analysis	3
Factor Analysis	3
Foreign Countries	3
Correlation	2
Generalizability Theory	2
Item Analysis	2
Language Proficiency	2
Models	2
Reading Tests	2
Test Construction	2
Test Reliability	2
Test Theory	2
Test Validity	2
Testing	2
Audiovisual Aids	1
Audiovisual Communications	1
Bias	1
Communication (Thought…	1
Comprehension	1
More ▼