Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 6 |
Descriptor
Evaluators | 7 |
Scoring | 7 |
Second Language Learning | 4 |
Comparative Analysis | 2 |
Correlation | 2 |
Effect Size | 2 |
Essay Tests | 2 |
Evaluation Criteria | 2 |
Guidelines | 2 |
Interrater Reliability | 2 |
Language Tests | 2 |
More ▼ |
Source
AERA Online Paper Repository | 1 |
Educational Measurement:… | 1 |
English Teaching | 1 |
Language Learning | 1 |
Language Testing | 1 |
Malaysian Online Journal of… | 1 |
Working Papers in TESOL &… | 1 |
Author
Akif Avcu | 1 |
Chen, Gaowei | 1 |
Gagne, Phill | 1 |
Han, Qie | 1 |
Huang, Jing | 1 |
In'nami, Yo | 1 |
Jiyeo Yun | 1 |
Koizumi, Rie | 1 |
Lissitz, Robert W. | 1 |
Plonsky, Luke | 1 |
Saito, Kazuya | 1 |
More ▼ |
Publication Type
Information Analyses | 7 |
Journal Articles | 6 |
Reports - Research | 2 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Akif Avcu – Malaysian Online Journal of Educational Technology, 2025
This scope-review presents the milestones of how Hierarchical Rater Models (HRMs) become operable to used in automated essay scoring (AES) to improve instructional evaluation. Although essay evaluations--a useful instrument for evaluating higher-order cognitive abilities--have always depended on human raters, concerns regarding rater bias,…
Descriptors: Automation, Scoring, Models, Educational Assessment
Huang, Jing; Chen, Gaowei – AERA Online Paper Repository, 2019
This research investigates the effects of rater experience on performance ratings in language testing using a systematic review of studies published from 1985 to 2017. Based on a comprehensive literature search of 14 databases, we identified sixteen relevant papers. With these we conducted a narrative review to conceptualize a theoretical…
Descriptors: Language Tests, Experience, Evaluators, Performance Based Assessment
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Saito, Kazuya; Plonsky, Luke – Language Learning, 2019
We propose a new framework for conceptualizing measures of instructed second language (L2) pronunciation performance according to three sets of parameters: (a) the constructs (focused on global vs. specific aspects of pronunciation), (b) the scoring method (human raters vs. acoustic analyses), and (c) the type of knowledge elicited (controlled vs.…
Descriptors: Second Language Learning, Second Language Instruction, Scoring, Pronunciation Instruction
Han, Qie – Working Papers in TESOL & Applied Linguistics, 2016
This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine "how" raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms…
Descriptors: Second Language Learning, Student Evaluation, Evaluators, Speech Tests
In'nami, Yo; Koizumi, Rie – Language Testing, 2016
We addressed Deville and Chalhoub-Deville's (2006), Schoonen's (2012), and Xi and Mollaun's (2006) call for research into the contextual features that are considered related to person-by-task interactions in the framework of generalizability theory in two ways. First, we quantitatively synthesized the generalizability studies to determine the…
Descriptors: Evaluators, Second Language Learning, Writing Skills, Oral Language
Schafer, William D.; Gagne, Phill; Lissitz, Robert W. – Educational Measurement: Issues and Practice, 2005
An assumption that is fundamental to the scoring of student-constructed responses (e.g., essays) is the ability of raters to focus on the response characteristics of interest rather than on other features. A common example, and the focus of this study, is the ability of raters to score a response based on the content achievement it demonstrates…
Descriptors: Scoring, Language Usage, Effect Size, Student Evaluation