ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	4
Since 2017 (last 10 years)	7
Since 2007 (last 20 years)	10

Descriptor

Evaluation Methods	10
Evaluators	10
Second Language Learning	6
English (Second Language)	4
Interrater Reliability	4
Language Tests	4
Validity	4
Evaluation Criteria	3
Language Proficiency	3
Oral Language	3
Rating Scales	3
Scoring	3
Writing Evaluation	3
College Students	2
Correlation	2
English for Academic Purposes	2
Foreign Countries	2
Native Language	2
Persuasive Discourse	2
Pragmatics	2
Reliability	2
Statistical Analysis	2
Student Evaluation	2
Accuracy	1
Achievement Rating	1
More ▼

Source

Language Testing

Author

Chan, Sathena	1
Han, Chao	1
Huiying Cai	1
Kuiken, Folkert	1
Lin, Chih-Kai	1
May, Lyn	1
Peterson, Meghan E.	1
Ping-Lin Chuang	1
Saito, Hidetoshi	1
Vedder, Ineke	1
Walters, F. Scott	1
Wind, Stefanie A.	1
Xiao, Xiaoyan	1
Xun Yan	1
Youn, Soo Jung	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	9
Information Analyses	1
Reports - Evaluative	1
Tests/Questionnaires	1

Education Level

Higher Education	4
Postsecondary Education	3

Audience

Location

China	1
Illinois (Urbana)	1
Japan	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Towards More Valid Scoring Criteria for Integrated Reading-Writing and Listening-Writing Summary Tasks

Peer reviewed

Direct link

Chan, Sathena; May, Lyn – Language Testing, 2023

Despite the increased use of integrated tasks in high-stakes academic writing assessment, research on rating criteria which reflect the unique construct of integrated summary writing skills is comparatively rare. Using a mixed-method approach of expert judgement, text analysis, and statistical analysis, this study examines writing features that…

Descriptors: Scoring, Writing Evaluation, Reading Tests, Listening Skills

A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment

Peer reviewed

Direct link

Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…

Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability

A Comparative Judgment Approach to Assessing Chinese Sign Language Interpreting

Peer reviewed

Direct link

Han, Chao; Xiao, Xiaoyan – Language Testing, 2022

The quality of sign language interpreting (SLI) is a gripping construct among practitioners, educators and researchers, calling for reliable and valid assessment. There has been a diverse array of methods in the extant literature to measure SLI quality, ranging from traditional error analysis to recent rubric scoring. In this study, we want to…

Descriptors: Comparative Analysis, Sign Language, Deaf Interpreting, Evaluators

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Functional Adequacy in L2 Writing: Towards a New Rating Scale

Peer reviewed

Direct link

Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017

The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…

Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse

Validity Argument for Assessing L2 Pragmatics in Interaction Using Mixed Methods

Peer reviewed

Direct link

Youn, Soo Jung – Language Testing, 2015

This study investigates the validity of assessing L2 pragmatics in interaction using mixed methods, focusing on the evaluation inference. Open role-plays that are meaningful and relevant to the stakeholders in an English for Academic Purposes context were developed for classroom assessment. For meaningful score interpretations and accurate…

Descriptors: Second Language Learning, Pragmatics, Validity, Mixed Methods Research

A Conversation-Analytic Hermeneutic Rating Protocol to Assess L2 Oral Pragmatic Competence

Peer reviewed

Direct link

Walters, F. Scott – Language Testing, 2007

Speech act theory-based, second language pragmatics testing (SLPT) poses problems for validation due to a lack of correspondence with empirical conversational data. Since conversation analysis (CA) provides a richer and more accurate account of language behavior, it may be preferred as a basis for SLPT development. However, applying CA methodology…

Descriptors: Inferences, Testing, Speech Acts, Language Tests

EFL Classroom Peer Assessment: Training Effects on Rating and Commenting

Peer reviewed

Direct link

Saito, Hidetoshi – Language Testing, 2008

This study examined the effects of training on peer assessment and comments provided regarding oral presentations in EFL (English as a Foreign Language) classrooms. In Study 1, both the treatment and control groups received instruction on skill aspects, but only the treatment group was given an additional 40-minute training on how to rate…

Descriptors: Control Groups, Student Attitudes, Peer Evaluation, English (Second Language)