Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 7 |
Descriptor
Interrater Reliability | 8 |
Writing Evaluation | 8 |
Evaluators | 6 |
Second Language Learning | 6 |
English (Second Language) | 5 |
Writing Tests | 5 |
Rating Scales | 4 |
Essays | 3 |
Foreign Countries | 3 |
Language Usage | 3 |
Certification | 2 |
More ▼ |
Source
Language Testing | 8 |
Author
Attali, Yigal | 1 |
Barkhuizen, Gary | 1 |
Chuang, Ping-Lin | 1 |
Elder, Catherine | 1 |
Jarvis, Scott | 1 |
Knoch, Ute | 1 |
Kuiken, Folkert | 1 |
Kyriakou, Nansia | 1 |
Lamprianou, Iasonas | 1 |
Schaefer, Edward | 1 |
Schoonen, Rob | 1 |
More ▼ |
Publication Type
Journal Articles | 8 |
Reports - Research | 6 |
Reports - Evaluative | 2 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 3 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Location
Europe | 1 |
Japan | 1 |
Netherlands | 1 |
Ohio | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023
This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…
Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification
Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021
This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…
Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation
Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017
The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…
Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse
Attali, Yigal – Language Testing, 2016
A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…
Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators
Jarvis, Scott – Language Testing, 2017
The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…
Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers
Schaefer, Edward – Language Testing, 2008
The present study employed multi-faceted Rasch measurement (MFRM) to explore the rater bias patterns of native English-speaker (NES) raters when they rate EFL essays. Forty NES raters rated 40 essays written by female Japanese university students on a single topic adapted from the TOEFL Test of Written English (TWE). The essays were assessed using…
Descriptors: Writing Evaluation, Writing Tests, Program Effectiveness, Essays
Elder, Catherine; Barkhuizen, Gary; Knoch, Ute; von Randow, Janet – Language Testing, 2007
The use of online rater self-training is growing in popularity and has obvious practical benefits, facilitating access to training materials and rating samples and allowing raters to reorient themselves to the rating scale and self monitor their behaviour at their own convenience. However there has thus far been little research into rater…
Descriptors: Writing Evaluation, Writing Tests, Scoring Rubrics, Rating Scales

Schoonen, Rob; And Others – Language Testing, 1997
Reports on three studies conducted in the Netherlands about the reading reliability of lay and expert readers in rating content and language usage of students' writing performances in three kinds of writing assignments. Findings reveal that expert readers are more reliable in rating usage, whereas both lay and expert readers are reliable raters of…
Descriptors: Foreign Countries, Interrater Reliability, Language Usage, Models