ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	6

Source

AERA Online Paper Repository	1
Educational Measurement:…	1
English Teaching	1
Language Learning	1
Language Testing	1
Malaysian Online Journal of…	1
Working Papers in TESOL &…	1

Author

Akif Avcu	1
Chen, Gaowei	1
Gagne, Phill	1
Han, Qie	1
Huang, Jing	1
In'nami, Yo	1
Jiyeo Yun	1
Koizumi, Rie	1
Lissitz, Robert W.	1
Plonsky, Luke	1
Saito, Kazuya	1
Schafer, William D.	1
More ▼

Publication Type

Information Analyses	7
Journal Articles	6
Reports - Research	2
Speeches/Meeting Papers	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 7 results Save | Export

Employing a Hierarchical Rater Models for Automated Scoring: Scope Review on the Application in Educational Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Akif Avcu – Malaysian Online Journal of Educational Technology, 2025

This scope-review presents the milestones of how Hierarchical Rater Models (HRMs) become operable to used in automated essay scoring (AES) to improve instructional evaluation. Although essay evaluations--a useful instrument for evaluating higher-order cognitive abilities--have always depended on human raters, concerns regarding rater bias,…

Descriptors: Automation, Scoring, Models, Educational Assessment

The Relationship between Rater Experience and Performance Ratings: A Systematic Review

Peer reviewed

Direct link

Huang, Jing; Chen, Gaowei – AERA Online Paper Repository, 2019

This research investigates the effects of rater experience on performance ratings in language testing using a systematic review of studies published from 1985 to 2017. Based on a comprehensive literature search of 14 databases, we identified sixteen relevant papers. With these we conducted a narrative review to conceptualize a theoretical…

Descriptors: Language Tests, Experience, Evaluators, Performance Based Assessment

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Effects of Second Language Pronunciation Teaching Revisited: A Proposed Measurement Framework and Meta-Analysis

Peer reviewed

Direct link

Saito, Kazuya; Plonsky, Luke – Language Learning, 2019

We propose a new framework for conceptualizing measures of instructed second language (L2) pronunciation performance according to three sets of parameters: (a) the constructs (focused on global vs. specific aspects of pronunciation), (b) the scoring method (human raters vs. acoustic analyses), and (c) the type of knowledge elicited (controlled vs.…

Descriptors: Second Language Learning, Second Language Instruction, Scoring, Pronunciation Instruction

Rater Cognition in L2 Speaking Assessment: A Review of the Literature

Peer reviewed
PDF on ERIC

Download full text

Han, Qie – Working Papers in TESOL & Applied Linguistics, 2016

This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine "how" raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms…

Descriptors: Second Language Learning, Student Evaluation, Evaluators, Speech Tests

Task and Rater Effects in L2 Speaking and Writing: A Synthesis of Generalizability Studies

Peer reviewed

Direct link

In'nami, Yo; Koizumi, Rie – Language Testing, 2016

We addressed Deville and Chalhoub-Deville's (2006), Schoonen's (2012), and Xi and Mollaun's (2006) call for research into the contextual features that are considered related to person-by-task interactions in the framework of generalizability theory in two ways. First, we quantitatively synthesized the generalizability studies to determine the…

Descriptors: Evaluators, Second Language Learning, Writing Skills, Oral Language

Resistance to Confounding Style and Content in Scoring Constructed-Response Items

Peer reviewed

Direct link

Schafer, William D.; Gagne, Phill; Lissitz, Robert W. – Educational Measurement: Issues and Practice, 2005

An assumption that is fundamental to the scoring of student-constructed responses (e.g., essays) is the ability of raters to focus on the response characteristics of interest rather than on other features. A common example, and the focus of this study, is the ability of raters to score a response based on the content achievement it demonstrates…

Descriptors: Scoring, Language Usage, Effect Size, Student Evaluation

Evaluators	7
Scoring	7
Second Language Learning	4
Comparative Analysis	2
Correlation	2
Effect Size	2
Essay Tests	2
Evaluation Criteria	2
Guidelines	2
Interrater Reliability	2
Language Tests	2
Measurement Techniques	2
Meta Analysis	2
Second Language Instruction	2
Student Evaluation	2
Writing Evaluation	2
Writing Skills	2
Accuracy	1
Acoustics	1
Automation	1
Computational Linguistics	1
Computer Assisted Testing	1
Computer Uses in Education	1
Databases	1
Decision Making	1
More ▼