ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	16
Since 2006 (last 20 years)	20

Descriptor

Decision Making	20
Evaluators	20
Scores	20
English (Second Language)	11
Language Tests	11
Second Language Learning	10
Foreign Countries	9
Language Proficiency	8
Second Language Instruction	8
Scoring	6
Comparative Analysis	5
Correlation	5
Speech Communication	5
Evaluation Methods	4
Oral Language	4
Protocol Analysis	4
Rating Scales	4
Scoring Rubrics	4
Simulation	4
Writing Evaluation	4
College Students	3
High Stakes Tests	3
Interrater Reliability	3
Interviews	3
Language Teachers	3
More ▼

Source

Language Testing	4
Advances in Health Sciences…	2
Educational Measurement:…	1
Educational and Psychological…	1
English Language Teaching	1
International Journal of…	1
Language Assessment Quarterly	1
Language Education &…	1
Language Learning	1
Language Testing in Asia	1
ProQuest LLC	1
Psychological Methods	1
Reading and Writing: An…	1
Second Language Research	1
Studies in Applied…	1
TESL-EJ	1
More ▼

Publication Type

Journal Articles	19
Reports - Research	17
Dissertations/Theses -…	1
Reports - Descriptive	1

Education Level

Higher Education	10
Postsecondary Education	9
Elementary Education	1
Grade 5	1
Grade 6	1
Intermediate Grades	1
Middle Schools	1

Audience

Location

China	2
Europe	2
Turkey	2
India	1
Japan (Tokyo)	1
New York (New York)	1
United Kingdom	1
United States	1
Vietnam	1

Laws, Policies, & Programs

Assessments and Surveys

International English…	3
Test of English as a Foreign…	2
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

Implicit versus Explicit First Impressions in Performance-Based Assessment: Will Raters Overcome Their First Impressions When Learner Performance Changes?

Peer reviewed

Direct link

Timothy J. Wood; Vijay J. Daniels; Debra Pugh; Claire Touchie; Samantha Halman; Susan Humphrey-Murto – Advances in Health Sciences Education, 2024

First impressions can influence rater-based judgments but their contribution to rater bias is unclear. Research suggests raters can overcome first impressions in experimental exam contexts with explicit first impressions, but these findings may not generalize to a workplace context with implicit first impressions. The study had two aims. First, to…

Descriptors: Evaluators, Work Environment, Decision Making, Video Technology

Language Testers and Their Place in the Policy Web

Peer reviewed

Direct link

Laura Schildt; Bart Deygers; Albert Weideman – Language Testing, 2024

In the context of policy-driven language testing for citizenship, a growing body of research examines the political justifications and ethical implications of language requirements and test use. However, virtually no studies have looked at the role that language testers play in the evolution of language requirements. Critical gaps remain in our…

Descriptors: Language Tests, Citizenship, Educational Policy, Assessment Literacy

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Peer reviewed
PDF on ERIC

Download full text

Eskin, Daniel – Studies in Applied Linguistics & TESOL, 2022

For agencies that deliver high-stakes Second Language (L2) proficiency exams, a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006). However, these challenges are more…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Student Placement

The Processes of Rating L2 Speaking Performance Using an Analytic Rating Scale -- A Qualitative Exploration

Peer reviewed
PDF on ERIC

Download full text

Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022

In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…

Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction

Rater Attitude towards Emerging Varieties of English: A New Rater Effect?

Peer reviewed

Direct link

Hsu, Tammy Huei-Lien – Language Testing in Asia, 2019

Background: A strong interest in researching World Englishes (WE) in relation to language assessment has become an emerging theme in language assessment studies over the past two decades. While research on WE has highlighted the status, function, and legitimacy of varieties of English language, it remains unclear how raters respond to the results…

Descriptors: Language Attitudes, Language Variation, Language Tests, Second Language Learning

The Effect of Rating Unfamiliar Items on Angoff Passing Scores

Peer reviewed

Direct link

Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017

The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…

Descriptors: Scores, Item Analysis, Classification, Decision Making

Roles of Collocation in L2 Oral Proficiency Revisited: Different Tasks, L1 vs. L2 Raters, and Cross-Sectional vs. Longitudinal Analyses

Peer reviewed

Direct link

Saito, Kazuya; Liu, Yuwei – Second Language Research, 2022

There is emerging evidence that collocation use plays a primary role in determining various dimensions of L2 oral proficiency assessment and development. The current study presents the results of three experiments which examined the relationship between the degree of association in collocation use (operationalized as t scores and mutual…

Descriptors: Phrase Structure, Case Studies, Second Language Learning, Second Language Instruction

Do Experience and Text Quality Matter for Raters' Decision-Making Behaviors?

Peer reviewed

Direct link

Sahan, Özgür; Razi, Salim – Language Testing, 2020

This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…

Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators

A Generalizability Theory Study of Optimal Measurement Design for a Summative Assessment of English/Chinese Consecutive Interpreting

Peer reviewed

Direct link

Han, Chao – Language Testing, 2019

Summative assessment of interpretation is widely conducted in interpreting courses/programs to inform high-stakes decision making, such as the selection, certification, and conferral of academic degrees. Yet there has been very limited empirical research to investigate the score dependability of summative interpretation assessment. The present…

Descriptors: Generalization, Decision Making, Summative Evaluation, Evaluators

"How Scripted Is This Going to Be?" Raters' Views of Authenticity in Speaking-Performance Tests

Peer reviewed

Direct link

Burton, John Dylan – Language Assessment Quarterly, 2020

An assumption underlying speaking tests is that scores reflect the ability to produce online, non-rehearsed speech. Speech produced in testing situations may, however, be less spontaneous if extensive test preparation takes place, resulting in memorized or rehearsed responses. If raters detect these patterns, they may conceptualize speech as…

Descriptors: Language Tests, Oral Language, Scores, Speech Communication

Accuracy in Identifying Students' Miscues during Oral Reading: A Taxonomy of Scorers' Mismarkings

Peer reviewed

Direct link

Reed, Deborah K.; Cummings, Kelli D.; Schaper, Andrew; Lynn, Devon; Biancarosa, Gina – Reading and Writing: An Interdisciplinary Journal, 2019

Informal reading inventories (IRI) and curriculum-based measures of reading (CBM-R) have continued importance in instructional planning, but raters have exhibited difficulty in accurately identifying students' miscues. To identify and tabulate scorers' mismarkings, this study employed examiners and raters who scored 15,051 words from 108 passage…

Descriptors: Accuracy, Miscue Analysis, Grade 5, Grade 6

Scores Assigned by Inexpert EFL Raters to Different Quality EFL Compositions, and the Raters' Decision-Making Behaviors

Peer reviewed
PDF on ERIC

Download full text

Han, Turgay – International Journal of Progressive Education, 2017

The aim of this study is to examine the variability in and reliability of scores assigned to different quality EFL compositions by EFL instructors and their rating behaviors. Using a mixed research design, quantitative data were collected from EFL instructors' ratings of 30 compositions of three different qualities using a holistic scoring rubric.…

Descriptors: English (Second Language), Writing Evaluation, Scores, Expertise

How Do Raters Judge Spoken Vocabulary?

Peer reviewed
PDF on ERIC

Download full text

Li, Hui – English Language Teaching, 2016

The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…

Descriptors: Foreign Countries, Evaluators, Interrater Reliability, Decision Making

Previous Page | Next Page »

Pages: 1 | 2

Albert Weideman	1
Baldwin, Peter	1
Barkaoui, Khaled	1
Bart Deygers	1
Biancarosa, Gina	1
Burton, John Dylan	1
Claire Touchie	1
Clauser, Jerome C.	1
Cooper, Harris	1
Cummings, Kelli D.	1
Davis, Lawrence Edward	1
Debra Pugh	1
Eskin, Daniel	1
Eva, Kevin	1
Hambleton, Ronald K.	1
Han, Chao	1
Han, Turgay	1
Hsu, Tammy Huei-Lien	1
Kang, Okim	1
Laura Schildt	1
Li, Hui	1
Liu, Yuwei	1
Lynn, Devon	1
Mann, Karen	1
Moran, Meghan Kerry	1
More ▼