Publication Date
| In 2026 | 0 |
| Since 2025 | 4 |
| Since 2022 (last 5 years) | 40 |
| Since 2017 (last 10 years) | 81 |
| Since 2007 (last 20 years) | 146 |
Descriptor
Source
| Language Testing | 233 |
Author
| Bachman, Lyle F. | 6 |
| Chapelle, Carol A. | 5 |
| Fulcher, Glenn | 5 |
| Henning, Grant | 5 |
| Yan, Xun | 5 |
| Davies, Alan | 4 |
| McNamara, Tim | 4 |
| Alderson, J. Charles | 3 |
| Aryadoust, Vahid | 3 |
| Cho, Yeonsuk | 3 |
| Davidson, Fred | 3 |
| More ▼ | |
Publication Type
| Journal Articles | 233 |
| Reports - Research | 140 |
| Reports - Evaluative | 51 |
| Reports - Descriptive | 21 |
| Opinion Papers | 20 |
| Information Analyses | 10 |
| Tests/Questionnaires | 6 |
| Speeches/Meeting Papers | 2 |
Education Level
Audience
| Researchers | 1 |
| Teachers | 1 |
Location
| China | 9 |
| Japan | 9 |
| United Kingdom | 7 |
| Australia | 6 |
| Netherlands | 5 |
| Brazil | 3 |
| California | 3 |
| South Korea | 3 |
| United Kingdom (England) | 3 |
| United States | 3 |
| Canada | 2 |
| More ▼ | |
Laws, Policies, & Programs
| Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Chapelle, Carol A.; Cotos, Elena; Lee, Jooyoung – Language Testing, 2015
Two examples demonstrate an argument-based approach to validation of diagnostic assessment using automated writing evaluation (AWE). "Criterion"®, was developed by Educational Testing Service to analyze students' papers grammatically, providing sentence-level error feedback. An interpretive argument was developed for its use as part of…
Descriptors: Diagnostic Tests, Writing Evaluation, Automation, Test Validity
Bouwer, Renske; Béguin, Anton; Sanders, Ted; van den Bergh, Huub – Language Testing, 2015
In the present study, aspects of the measurement of writing are disentangled in order to investigate the validity of inferences made on the basis of writing performance and to describe implications for the assessment of writing. To include genre as a facet in the measurement, we obtained writing scores of 12 texts in four different genres for each…
Descriptors: Writing Tests, Generalization, Scores, Writing Instruction
Trace, Jonathan; Brown, James Dean; Janssen, Gerriet; Kozhevnikova, Liudmila – Language Testing, 2017
Cloze tests have been the subject of numerous studies regarding their function and use in both first language and second language contexts (e.g., Jonz & Oller, 1994; Watanabe & Koyama, 2008). From a validity standpoint, one area of investigation has been the extent to which cloze tests measure reading ability beyond the sentence level.…
Descriptors: Cloze Procedure, Language Tests, Test Items, Item Analysis
Kyle, Kristopher; Crossley, Scott A.; McNamara, Danielle S. – Language Testing, 2016
This study explores the construct validity of speaking tasks included in the TOEFL iBT (e.g., integrated and independent speaking tasks). Specifically, advanced natural language processing (NLP) tools, MANOVA difference statistics, and discriminant function analyses (DFA) are used to assess the degree to which and in what ways responses to these…
Descriptors: Construct Validity, Natural Language Processing, Speech Skills, Speech Acts
Ilc, Gašper; Stopar, Andrej – Language Testing, 2015
The paper examines the results of the CEFR alignment project for the Slovenian national examinations in English. The authors aim to validate externally the standard-setting procedures by adopting a socio-cognitive model of validation (Khalifa & Weir, 2009; Weir, 2005) to analyse the scoring, context and cognitive validity of three reading…
Descriptors: Foreign Countries, English (Second Language), Second Language Instruction, Second Language Learning
Sasaki, Miyuki – Language Testing, 2012
The Modern Language Aptitude Test (Paper-and-Pencil Version, henceforth, the MLAT) measures "an individual's ability to learn a foreign language." It targets English-speaking adults (over Grade 9) who are literate. The test has only one form, which has not changed since it was first published by the Psychological Corporation in 1959. The test can…
Descriptors: Aptitude Tests, Test Reviews, Rewards, Acoustics
Deygers, Bart; Van Gorp, Koen – Language Testing, 2015
Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…
Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability
Ling, Guangming; Mollaun, Pamela; Xi, Xiaoming – Language Testing, 2014
The scoring of constructed responses may introduce construct-irrelevant factors to a test score and affect its validity and fairness. Fatigue is one of the factors that could negatively affect human performance in general, yet little is known about its effects on a human rater's scoring quality on constructed responses. In this study, we compared…
Descriptors: Evaluators, Fatigue (Biology), Scoring, Performance
Jarvis, Scott – Language Testing, 2017
The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…
Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers
Davies, Alan – Language Testing, 2010
This article presents the author's response to Xiaoming Xi's paper titled "How do we go about investigating test fairness?" In the paper, Xi offers "a means to fully integrate fairness investigations and practice". Given the current importance accorded to fairness in the language testing community, Xi makes a case for viewing fairness as an aspect…
Descriptors: Investigations, Testing, Language Tests, Validity
Davies, Alan – Language Testing, 2012
In this article, the author begins by discussing four challenges on the concept of validity. These challenges are: (1) the appeal to logic and syllogistic reasoning; (2) the claim of reliability; (3) the local and the universal; and (4) the unitary and the divisible. In language testing validity cannot be achieved directly but only through a…
Descriptors: Language Tests, Test Validity, Test Reliability, Testing
Hsu, Tammy Huei-Lien – Language Testing, 2016
This study explores the attitudes of raters of English speaking tests towards the global spread of English and the challenges in rating speakers of Indian English in descriptive speaking tasks. The claims put forward by language attitude studies indicate a validity issue in English speaking tests: listeners tend to hold negative attitudes towards…
Descriptors: Evaluators, Language Tests, English (Second Language), Second Language Learning
Youn, Soo Jung – Language Testing, 2015
This study investigates the validity of assessing L2 pragmatics in interaction using mixed methods, focusing on the evaluation inference. Open role-plays that are meaningful and relevant to the stakeholders in an English for Academic Purposes context were developed for classroom assessment. For meaningful score interpretations and accurate…
Descriptors: Second Language Learning, Pragmatics, Validity, Mixed Methods Research
Coombe, Christine; Davidson, Peter – Language Testing, 2014
The Common Educational Proficiency Assessment (CEPA) is a large-scale, high-stakes, English language proficiency/placement test administered in the United Arab Emirates to Emirati nationals in their final year of secondary education or Grade 12. The purpose of the CEPA is to place students into English classes at the appropriate government…
Descriptors: Language Tests, High Stakes Tests, English (Second Language), Second Language Learning
Jin, Tan; Mak, Barley; Zhou, Pei – Language Testing, 2012
The fuzziness of assessing second language speaking performance raises two difficulties in scoring speaking performance: "indistinction between adjacent levels" and "overlap between scales". To address these two problems, this article proposes a new approach, "confidence scoring", to deal with such fuzziness, leading to "confidence" scores between…
Descriptors: Speech Communication, Scoring, Test Interpretation, Second Language Learning

Peer reviewed
Direct link
