ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	13

Descriptor

Decision Making	13
Evaluators	13
Writing Evaluation	13
English (Second Language)	10
Second Language Learning	9
Essays	8
Foreign Countries	7
Second Language Instruction	6
Protocol Analysis	5
Rating Scales	5
Scoring	5
College Students	4
Comparative Analysis	4
Evaluation Criteria	4
Interrater Reliability	4
Scores	4
Scoring Rubrics	4
College Faculty	3
Graduate Students	3
Holistic Approach	3
Language Teachers	3
Language Tests	3
Undergraduate Students	3
Accuracy	2
Interviews	2
More ▼

Source

Language Testing	4
Assessment in Education:…	1
Educational Measurement:…	1
English Language Teaching	1
International Journal of…	1
Journal of Education and…	1
Language Assessment Quarterly	1
Language Testing in Asia	1
PASAA: Journal of Language…	1
Reading & Writing Quarterly	1

Author

Barkaoui, Khaled	2
Han, Turgay	2
Wind, Stefanie A.	2
Abbasi, Abbas	1
Ghanbari, Nasim	1
Heidari, Nasim	1
Huang, Jinyan	1
Jarvis, Scott	1
Jiehui Hu	1
Jølle, Lennart	1
Lian Li	1
Makiko Kato	1
Ping Zhou	1
Razi, Salim	1
Sahan, Özgür	1
Walker, A. Adrienne	1
Wanhong Zhang	1
Wu, Xuefeng	1
Yu Dai	1
More ▼

Publication Type

Journal Articles	13
Reports - Research	12
Tests/Questionnaires	2

Education Level

Higher Education	8
Postsecondary Education	5
Adult Education	1
High Schools	1
Secondary Education	1

Audience

Location

Turkey	3
China	2
Japan	1
Norway	1
Ohio	1

Laws, Policies, & Programs

Assessments and Surveys

International English…

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Scoring Difficulty in Summary Writing Assessment: Toward the Reconstruction of Analytic Rubric

Peer reviewed
PDF on ERIC

Download full text

Makiko Kato – Journal of Education and Learning, 2025

This study aims to examine whether differences exist in the factors influencing the difficulty of scoring English summaries and determining scores based on the raters' attributes, and to collect candid opinions, considerations, and tentative suggestions for future improvements to the analytic rubric of summary writing for English learners. In this…

Descriptors: Writing Evaluation, Scoring, Writing Skills, English (Second Language)

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

Raters' Perceptions of Rating Scales Criteria and Its Effect on the Process and Outcome of Their Rating

Peer reviewed

Direct link

Heidari, Nasim; Ghanbari, Nasim; Abbasi, Abbas – Language Testing in Asia, 2022

It is widely believed that human rating performance is influenced by an array of different factors. Among these, rater-related variables such as experience, language background, perceptions, and attitudes have been mentioned. One of the important rater-related factors is the way the raters interact with the rating scales. In particular, how raters…

Descriptors: Evaluators, Rating Scales, Language Tests, English (Second Language)

Establishing an Operational Model of Rating Scale Construction for English Writing Assessment

Peer reviewed
PDF on ERIC

Download full text

Wu, Xuefeng – English Language Teaching, 2022

Rating scales for writing assessment are critical in that they determine directly the quality and fairness of such performance tests. However, in many EFL contexts, rating scales are made, to certain extent, based on the intuition of teachers who strongly need a feasible and scientific route to guide their construction of rating scales. This study…

Descriptors: Writing Evaluation, Rating Scales, Second Language Learning, Second Language Instruction

Do Experience and Text Quality Matter for Raters' Decision-Making Behaviors?

Peer reviewed

Direct link

Sahan, Özgür; Razi, Salim – Language Testing, 2020

This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…

Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators

Scores Assigned by Inexpert EFL Raters to Different Quality EFL Compositions, and the Raters' Decision-Making Behaviors

Peer reviewed
PDF on ERIC

Download full text

Han, Turgay – International Journal of Progressive Education, 2017

The aim of this study is to examine the variability in and reliability of scores assigned to different quality EFL compositions by EFL instructors and their rating behaviors. Using a mixed research design, quantitative data were collected from EFL instructors' ratings of 30 compositions of three different qualities using a holistic scoring rubric.…

Descriptors: English (Second Language), Writing Evaluation, Scores, Expertise

Rater Strategies for Reaching Agreement on Pupil Text Quality

Peer reviewed

Direct link

Jølle, Lennart – Assessment in Education: Principles, Policy & Practice, 2015

Novice members of a Norwegian national rater panel tasked with assessing Year 8 pupils' written texts were studied during three successive preparation sessions (2011-2012). The purpose was to investigate how the raters successfully make use of different decision-making strategies in an assessment situation where pre-set criteria and standards give…

Descriptors: Interrater Reliability, Writing Evaluation, Decision Making, Novices

Examining the Impact of Scoring Methods on the Institutional EFL Writing Assessment: A Turkish Perspective

Peer reviewed
PDF on ERIC

Download full text

Han, Turgay; Huang, Jinyan – PASAA: Journal of Language Teaching and Learning in Thailand, 2017

Using generalizability (G-) theory and rater interviews as both quantitative and qualitative approaches, this study examined the impact of scoring methods (i.e., holistic versus analytic scoring) on the scoring variability and reliability of an EFL institutional writing assessment at a Turkish university. Ten raters were invited to rate 36…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

Grounding Lexical Diversity in Human Judgments

Peer reviewed

Direct link

Jarvis, Scott – Language Testing, 2017

The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…

Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers

Think-Aloud Protocols in Research on Essay Rating: An Empirical Study of Their Veridicality and Reactivity

Peer reviewed

Direct link

Barkaoui, Khaled – Language Testing, 2011

Think-aloud protocols (TAPs) are frequently used in research on essay rating processes. However, there are very few empirical studies of the completeness of TAP data and the effects of this technique on rater performance (i.e., rating processes and outcomes). This study aims to start to address this research gap. As part of a larger study on rater…

Descriptors: Protocol Analysis, Rating Scales, Essays, English (Second Language)

Variability in ESL Essay Rating Processes: The Role of the Rating Scale and Rater Experience

Peer reviewed

Direct link

Barkaoui, Khaled – Language Assessment Quarterly, 2010

Various factors contribute to variability in English as a second language (ESL) essay scores and rating processes. Most previous research, however, has focused on score variability in relation to task, rater, and essay characteristics. A few studies have examined variability in essay rating processes. The current study used think-aloud protocols…

Descriptors: Protocol Analysis, Holistic Evaluation, Evaluation Criteria, Rating Scales