ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	14

Descriptor

Evaluators	15
Language Tests	15
Performance Based Assessment	15
Second Language Learning	9
English (Second Language)	8
Scoring	6
Correlation	4
Language Proficiency	4
Scores	4
Second Language Instruction	4
Writing Tests	4
Accuracy	3
Certification	3
Decision Making	3
Evaluation Criteria	3
Evaluation Methods	3
Performance Tests	3
Rating Scales	3
Writing Evaluation	3
Case Studies	2
Difficulty Level	2
Holistic Approach	2
Interrater Reliability	2
Native Speakers	2
Profiles	2
More ▼

Source

Language Testing	7
AERA Online Paper Repository	1
Journal of Pan-Pacific…	1
Language Assessment Quarterly	1
Language Teaching Research…	1
Measurement:…	1
ProQuest LLC	1
Studies in Applied…	1

Author

Eckes, Thomas	2
Lim, Gad S.	2
Barkaoui, Khaled	1
Chen, Gaowei	1
Dimova, Slobodanka	1
Eskin, Daniel	1
Huang, Jing	1
Jin, Kuan-Yu	1
Johnson, Jeff S.	1
Kenyon, Dorry	1
Kondo, Yusuke	1
Kozaki, Yoko	1
Lin, Chih-Kai	1
Stansfield, Charles W.	1
Wind, Stefanie A.	1
Won, Yongkook	1
Xi, Xiaoming	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	12
Speeches/Meeting Papers	2
Dissertations/Theses -…	1
Information Analyses	1
Reports - Evaluative	1

Education Level

Higher Education	2
Postsecondary Education	2
High Schools	1
Secondary Education	1

Audience

Location

Japan	1
New York (New York)	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Detecting Rater Centrality Effects in Performance Assessments: A Model-Based Comparison of Centrality Indices

Peer reviewed

Direct link

Jin, Kuan-Yu; Eckes, Thomas – Measurement: Interdisciplinary Research and Perspectives, 2022

Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale's middle categories. In the present paper, we adopted Jin and Wang's (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters…

Descriptors: Performance Based Assessment, Evaluators, Scoring, Sample Size

The Relationship between Rater Experience and Performance Ratings: A Systematic Review

Peer reviewed

Direct link

Huang, Jing; Chen, Gaowei – AERA Online Paper Repository, 2019

This research investigates the effects of rater experience on performance ratings in language testing using a systematic review of studies published from 1985 to 2017. Based on a comprehensive literature search of 14 databases, we identified sixteen relevant papers. With these we conducted a narrative review to conceptualize a theoretical…

Descriptors: Language Tests, Experience, Evaluators, Performance Based Assessment

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

Performance-Based Speaking Tests: Possibilities in Local Language Testing

Peer reviewed
PDF on ERIC

Download full text

Dimova, Slobodanka – Language Teaching Research Quarterly, 2022

Drawing on Glenn Fulcher's extensive work in performance-based language assessment of speaking, this paper explores the assessment of L2 speaking ability in local language testing contexts. For that purpose, I review Fulcher's influential work that highlights the relationship between the speaking construct, the task, the performance, and the…

Descriptors: Language Tests, Speech Communication, Performance Based Assessment, Second Language Learning

Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Peer reviewed
PDF on ERIC

Download full text

Eskin, Daniel – Studies in Applied Linguistics & TESOL, 2022

For agencies that deliver high-stakes Second Language (L2) proficiency exams, a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006). However, these challenges are more…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Student Placement

The Effect of Task Complexity on Rater Severity in an Adaptive Performance-Based Second Language Oral Communication Test

Direct link

Won, Yongkook – ProQuest LLC, 2019

Despite the benefits of performance-based oral communication tests, a plethora of variables, as illustrated in Ockey and Li's (2015) model of oral communication assessment, can create construct-irrelevant variance in test scores. In relation to human participants in the oral communication tests, previous studies mostly focused on the direct effect…

Descriptors: Oral Language, Language Tests, English (Second Language), Second Language Learning

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

The Development and Maintenance of Rating Quality in Performance Writing Assessment: A Longitudinal Study of New and Experienced Raters

Peer reviewed

Direct link

Lim, Gad S. – Language Testing, 2011

Raters are central to writing performance assessment, and rater development--training, experience, and expertise--involves a temporal dimension. However, few studies have examined new and experienced raters' rating performance longitudinally over multiple time points. This study uses operational data from the writing section of the MELAB (n =…

Descriptors: Expertise, Writing Evaluation, Performance Based Assessment, Writing Tests

Explaining ESL Essay Holistic Scores: A Multilevel Modeling Approach

Peer reviewed

Direct link

Barkaoui, Khaled – Language Testing, 2010

This study adopted a multilevel modeling (MLM) approach to examine the contribution of rater and essay factors to variability in ESL essay holistic scores. Previous research aiming to explain variability in essay holistic scores has focused on either rater or essay factors. The few studies that have examined the contribution of more than one…

Descriptors: Performance Based Assessment, English (Second Language), Second Language Learning, Holistic Approach

Examination of Rater Training Effect and Rater Eligibility in L2 Performance Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Kondo, Yusuke – Journal of Pan-Pacific Association of Applied Linguistics, 2010

The purposes of this study were to investigate the effects of rater training in an L2 performance assessment and to examine the eligibility of L2 users of English as raters in L2 performance assessment. Rater training was conducted in order for raters to clearly understand the criteria, the evaluation items, and the evaluation procedure. In this…

Descriptors: Video Technology, Eligibility, Performance Based Assessment, Performance Tests

An Alternative Decision-Making Procedure for Performance Assessments: Using the Multifaceted Rash Model to Generate Cut Estimates

Peer reviewed

Direct link

Kozaki, Yoko – Language Assessment Quarterly, 2010

This article describes an alternative approach to setting standards for performance assessments. The procedure was designed for use in low-budget, relatively low-stakes contexts where it is not possible to bring expert judges together. The procedure that allowed participant judges to work individually throughout the process was an effort to…

Descriptors: Performance Based Assessment, Standard Setting, Decision Making, Certification

The Influence of Rater Language Background on Writing Performance Assessment

Peer reviewed

Direct link

Johnson, Jeff S.; Lim, Gad S. – Language Testing, 2009

Language performance assessments typically require human raters, introducing possible error. In international examinations of English proficiency, rater language background is an especially salient factor that needs to be considered. The existence of rater language background-related bias in writing performance assessment is the object of this…

Descriptors: Performance Based Assessment, Performance Tests, Native Speakers, English (Second Language)

Rater Types in Writing Performance Assessments: A Classification Approach to Rater Variability

Peer reviewed

Direct link

Eckes, Thomas – Language Testing, 2008

Research on rater effects in language performance assessments has provided ample evidence for a considerable degree of variability among raters. Building on this research, I advance the hypothesis that experienced raters fall into types or classes that are clearly distinguishable from one another with respect to the importance they attach to…

Descriptors: Performance Based Assessment, Language Tests, Measures (Individuals), Scoring

Evaluating Analytic Scoring for the TOEFL[R] Academic Speaking Test (TAST) for Operational Use

Peer reviewed

Direct link

Xi, Xiaoming – Language Testing, 2007

This study explores the utility of analytic scoring for TAST in providing useful and reliable diagnostic information for operational use in three aspects of candidates' performance: delivery, language use and topic development. One hundred and forty examinees' responses to six TAST tasks were scored analytically on these three aspects of speech. G…

Descriptors: Scoring, Profiles, Performance Based Assessment, Academic Discourse

Evaluating the Efficacy of Rater Self-Training.

Download full text

Kenyon, Dorry; Stansfield, Charles W. – 1993

This paper examines whether individuals who train themselves to score a performance assessment will rate acceptably when compared to known standards. Research on the efficacy of rater self-training materials developed by the Center for Applied Linguistics for the Texas Oral Proficiency Test (TOPT) is examined. Rater self-materials are described…

Descriptors: Bilingual Education, Comparative Analysis, Evaluators, Individual Characteristics