Publication Date
In 2025 | 1 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 13 |
Since 2006 (last 20 years) | 18 |
Descriptor
Item Response Theory | 18 |
Language Tests | 13 |
Foreign Countries | 12 |
English (Second Language) | 8 |
Second Language Learning | 8 |
Test Reliability | 8 |
Interrater Reliability | 7 |
Evaluators | 6 |
Rating Scales | 6 |
Scoring | 6 |
Correlation | 5 |
More ▼ |
Source
Language Testing | 18 |
Author
Alanen, Riikka | 1 |
Ann Tai Choe | 1 |
Audeoud, Mireille | 1 |
Batty, Aaron Olaf | 1 |
Brunfaut, Tineke | 1 |
Cai, Yuyang | 1 |
Chan, Stephanie W. Y. | 1 |
Cheung, Wai Ming | 1 |
Choi, Ikkyu | 1 |
Chuang, Ping-Lin | 1 |
Culligan, Brent | 1 |
More ▼ |
Publication Type
Journal Articles | 18 |
Reports - Research | 17 |
Reports - Evaluative | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Education | 5 |
Higher Education | 3 |
Postsecondary Education | 3 |
Early Childhood Education | 1 |
Elementary Secondary Education | 1 |
Grade 6 | 1 |
Intermediate Grades | 1 |
Kindergarten | 1 |
Primary Education | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Peabody Picture Vocabulary… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024
This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…
Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024
In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…
Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023
Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…
Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes
Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023
This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…
Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification
Wind, Stefanie A. – Language Testing, 2019
Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…
Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests
Haug, Tobias; Batty, Aaron Olaf; Venetz, Martin; Notter, Christa; Girard-Groeber, Simone; Knoch, Ute; Audeoud, Mireille – Language Testing, 2020
In this study we seek evidence of validity according to the socio-cognitive framework (Weir, 2005) for a new sentence repetition test (SRT) for young Deaf L1 Swiss German Sign Language (DSGS) users. SRTs have been developed for various purposes for both spoken and sign languages to assess language development in children. In order to address the…
Descriptors: Foreign Countries, Language Tests, Sentences, Repetition
Cai, Yuyang; Kunnan, Antony John – Language Testing, 2020
An essential hypothesis of modern language assessment theory pertains to the interaction between strategy use ability (strategic competence) and second language knowledge. However, how they interact with each other is rarely explored. Drawing on relevant research in the literature, in this paper we proposed three interaction patterns (i.e.,…
Descriptors: English (Second Language), Second Language Learning, Nursing Education, Reading Tests
Chan, Stephanie W. Y.; Cheung, Wai Ming; Huang, Yanli; Lam, Wai-Ip; Lin, Chin-Hsi – Language Testing, 2020
Demand for second-language (L2) Chinese education for kindergarteners has grown rapidly, but little is known about these kindergarteners' L2 skills, with existing studies focusing on school-age populations and alphabetic languages. Accordingly, we developed a six-subtest Chinese character acquisition assessment to measure L2 kindergarteners'…
Descriptors: Chinese, Second Language Learning, Second Language Instruction, Written Language
Choi, Ikkyu; Papageorgiou, Spiros – Language Testing, 2020
Stakeholders of language tests are often interested in subscores. However, reporting a subscore is not always justified; a subscore should provide reliable and distinct information to be worth reporting. When a subscore is used for decisions across multiple levels (e.g., individual test takers and schools), it needs to be justified for its…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Scores
Longabach, Tanya; Peyton, Vicki – Language Testing, 2018
K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…
Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Culligan, Brent – Language Testing, 2015
This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
Descriptors: Language Tests, Test Format, Comparative Analysis, Vocabulary
Deygers, Bart; Van Gorp, Koen – Language Testing, 2015
Considering scoring validity as encompassing both reliable rating scale use and valid descriptor interpretation, this study reports on the validation of a CEFR-based scale that was co-constructed and used by novice raters. The research questions this paper wishes to answer are (a) whether it is possible to construct a CEFR-based rating scale with…
Descriptors: Rating Scales, Scoring, Validity, Interrater Reliability
Previous Page | Next Page »
Pages: 1 | 2