NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 22 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024
This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…
Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Ping-Lin Chuang – Language Testing, 2025
This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…
Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources
Peer reviewed Peer reviewed
Direct linkDirect link
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024
In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…
Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)
Peer reviewed Peer reviewed
Direct linkDirect link
Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023
Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…
Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes
Peer reviewed Peer reviewed
Direct linkDirect link
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Peer reviewed Peer reviewed
Direct linkDirect link
Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023
This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…
Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification
Peer reviewed Peer reviewed
Direct linkDirect link
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021
This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…
Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Duijm, Klaartje; Schoonen, Rob; Hulstijn, Jan H. – Language Testing, 2018
It is general practice to use rater judgments in speaking proficiency testing. However, it has been shown that raters' knowledge and experience may influence their ratings, both in terms of leniency and varied focus on different aspects of speech. The purpose of this study is to identify raters' relative responsiveness to fluency and linguistic…
Descriptors: Language Fluency, Accuracy, Second Languages, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Kang, Okim; Rubin, Don; Kermad, Alyssa – Language Testing, 2019
As a result of the fact that judgments of non-native speech are closely tied to social biases, oral proficiency ratings are susceptible to error because of rater background and social attitudes. In the present study we seek first to estimate the variance attributable to rater background and attitudinal variables on novice raters' assessments of L2…
Descriptors: Evaluators, Second Language Learning, Language Tests, English (Second Language)
Peer reviewed Peer reviewed
Direct linkDirect link
Davis, Larry – Language Testing, 2016
Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…
Descriptors: Evaluators, Oral Language, Scores, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017
The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…
Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse
Previous Page | Next Page »
Pages: 1  |  2