Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 13 |
Descriptor
Decision Making | 13 |
Evaluators | 13 |
Writing Evaluation | 13 |
English (Second Language) | 10 |
Second Language Learning | 9 |
Essays | 8 |
Foreign Countries | 7 |
Second Language Instruction | 6 |
Protocol Analysis | 5 |
Rating Scales | 5 |
Scoring | 5 |
More ▼ |
Source
Author
Barkaoui, Khaled | 2 |
Han, Turgay | 2 |
Wind, Stefanie A. | 2 |
Abbasi, Abbas | 1 |
Ghanbari, Nasim | 1 |
Heidari, Nasim | 1 |
Huang, Jinyan | 1 |
Jarvis, Scott | 1 |
Jiehui Hu | 1 |
Jølle, Lennart | 1 |
Lian Li | 1 |
More ▼ |
Publication Type
Journal Articles | 13 |
Reports - Research | 12 |
Tests/Questionnaires | 2 |
Education Level
Higher Education | 8 |
Postsecondary Education | 5 |
Adult Education | 1 |
High Schools | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
International English… | 1 |
What Works Clearinghouse Rating
Makiko Kato – Journal of Education and Learning, 2025
This study aims to examine whether differences exist in the factors influencing the difficulty of scoring English summaries and determining scores based on the raters' attributes, and to collect candid opinions, considerations, and tentative suggestions for future improvements to the analytic rubric of summary writing for English learners. In this…
Descriptors: Writing Evaluation, Scoring, Writing Skills, English (Second Language)
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Wind, Stefanie A. – Language Testing, 2023
Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…
Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment
Heidari, Nasim; Ghanbari, Nasim; Abbasi, Abbas – Language Testing in Asia, 2022
It is widely believed that human rating performance is influenced by an array of different factors. Among these, rater-related variables such as experience, language background, perceptions, and attitudes have been mentioned. One of the important rater-related factors is the way the raters interact with the rating scales. In particular, how raters…
Descriptors: Evaluators, Rating Scales, Language Tests, English (Second Language)
Wu, Xuefeng – English Language Teaching, 2022
Rating scales for writing assessment are critical in that they determine directly the quality and fairness of such performance tests. However, in many EFL contexts, rating scales are made, to certain extent, based on the intuition of teachers who strongly need a feasible and scientific route to guide their construction of rating scales. This study…
Descriptors: Writing Evaluation, Rating Scales, Second Language Learning, Second Language Instruction
Sahan, Özgür; Razi, Salim – Language Testing, 2020
This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…
Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators
Han, Turgay – International Journal of Progressive Education, 2017
The aim of this study is to examine the variability in and reliability of scores assigned to different quality EFL compositions by EFL instructors and their rating behaviors. Using a mixed research design, quantitative data were collected from EFL instructors' ratings of 30 compositions of three different qualities using a holistic scoring rubric.…
Descriptors: English (Second Language), Writing Evaluation, Scores, Expertise
Jølle, Lennart – Assessment in Education: Principles, Policy & Practice, 2015
Novice members of a Norwegian national rater panel tasked with assessing Year 8 pupils' written texts were studied during three successive preparation sessions (2011-2012). The purpose was to investigate how the raters successfully make use of different decision-making strategies in an assessment situation where pre-set criteria and standards give…
Descriptors: Interrater Reliability, Writing Evaluation, Decision Making, Novices
Han, Turgay; Huang, Jinyan – PASAA: Journal of Language Teaching and Learning in Thailand, 2017
Using generalizability (G-) theory and rater interviews as both quantitative and qualitative approaches, this study examined the impact of scoring methods (i.e., holistic versus analytic scoring) on the scoring variability and reliability of an EFL institutional writing assessment at a Turkish university. Ten raters were invited to rate 36…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring
Jarvis, Scott – Language Testing, 2017
The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…
Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers
Barkaoui, Khaled – Language Testing, 2011
Think-aloud protocols (TAPs) are frequently used in research on essay rating processes. However, there are very few empirical studies of the completeness of TAP data and the effects of this technique on rater performance (i.e., rating processes and outcomes). This study aims to start to address this research gap. As part of a larger study on rater…
Descriptors: Protocol Analysis, Rating Scales, Essays, English (Second Language)
Barkaoui, Khaled – Language Assessment Quarterly, 2010
Various factors contribute to variability in English as a second language (ESL) essay scores and rating processes. Most previous research, however, has focused on score variability in relation to task, rater, and essay characteristics. A few studies have examined variability in essay rating processes. The current study used think-aloud protocols…
Descriptors: Protocol Analysis, Holistic Evaluation, Evaluation Criteria, Rating Scales