Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 16 |
Since 2006 (last 20 years) | 20 |
Descriptor
Source
Author
Albert Weideman | 1 |
Baldwin, Peter | 1 |
Barkaoui, Khaled | 1 |
Bart Deygers | 1 |
Biancarosa, Gina | 1 |
Burton, John Dylan | 1 |
Claire Touchie | 1 |
Clauser, Jerome C. | 1 |
Cooper, Harris | 1 |
Cummings, Kelli D. | 1 |
Davis, Lawrence Edward | 1 |
More ▼ |
Publication Type
Journal Articles | 19 |
Reports - Research | 17 |
Dissertations/Theses -… | 1 |
Reports - Descriptive | 1 |
Education Level
Higher Education | 10 |
Postsecondary Education | 9 |
Elementary Education | 1 |
Grade 5 | 1 |
Grade 6 | 1 |
Intermediate Grades | 1 |
Middle Schools | 1 |
Audience
Location
China | 2 |
Europe | 2 |
Turkey | 2 |
India | 1 |
Japan (Tokyo) | 1 |
New York (New York) | 1 |
United Kingdom | 1 |
United States | 1 |
Vietnam | 1 |
Laws, Policies, & Programs
Assessments and Surveys
International English… | 3 |
Test of English as a Foreign… | 2 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Timothy J. Wood; Vijay J. Daniels; Debra Pugh; Claire Touchie; Samantha Halman; Susan Humphrey-Murto – Advances in Health Sciences Education, 2024
First impressions can influence rater-based judgments but their contribution to rater bias is unclear. Research suggests raters can overcome first impressions in experimental exam contexts with explicit first impressions, but these findings may not generalize to a workplace context with implicit first impressions. The study had two aims. First, to…
Descriptors: Evaluators, Work Environment, Decision Making, Video Technology
Laura Schildt; Bart Deygers; Albert Weideman – Language Testing, 2024
In the context of policy-driven language testing for citizenship, a growing body of research examines the political justifications and ethical implications of language requirements and test use. However, virtually no studies have looked at the role that language testers play in the evolution of language requirements. Critical gaps remain in our…
Descriptors: Language Tests, Citizenship, Educational Policy, Assessment Literacy
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Eskin, Daniel – Studies in Applied Linguistics & TESOL, 2022
For agencies that deliver high-stakes Second Language (L2) proficiency exams, a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006). However, these challenges are more…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Student Placement
Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022
In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…
Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction
Hsu, Tammy Huei-Lien – Language Testing in Asia, 2019
Background: A strong interest in researching World Englishes (WE) in relation to language assessment has become an emerging theme in language assessment studies over the past two decades. While research on WE has highlighted the status, function, and legitimacy of varieties of English language, it remains unclear how raters respond to the results…
Descriptors: Language Attitudes, Language Variation, Language Tests, Second Language Learning
Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017
The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…
Descriptors: Scores, Item Analysis, Classification, Decision Making
Saito, Kazuya; Liu, Yuwei – Second Language Research, 2022
There is emerging evidence that collocation use plays a primary role in determining various dimensions of L2 oral proficiency assessment and development. The current study presents the results of three experiments which examined the relationship between the degree of association in collocation use (operationalized as t scores and mutual…
Descriptors: Phrase Structure, Case Studies, Second Language Learning, Second Language Instruction
Sahan, Özgür; Razi, Salim – Language Testing, 2020
This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…
Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators
Han, Chao – Language Testing, 2019
Summative assessment of interpretation is widely conducted in interpreting courses/programs to inform high-stakes decision making, such as the selection, certification, and conferral of academic degrees. Yet there has been very limited empirical research to investigate the score dependability of summative interpretation assessment. The present…
Descriptors: Generalization, Decision Making, Summative Evaluation, Evaluators
Burton, John Dylan – Language Assessment Quarterly, 2020
An assumption underlying speaking tests is that scores reflect the ability to produce online, non-rehearsed speech. Speech produced in testing situations may, however, be less spontaneous if extensive test preparation takes place, resulting in memorized or rehearsed responses. If raters detect these patterns, they may conceptualize speech as…
Descriptors: Language Tests, Oral Language, Scores, Speech Communication
Reed, Deborah K.; Cummings, Kelli D.; Schaper, Andrew; Lynn, Devon; Biancarosa, Gina – Reading and Writing: An Interdisciplinary Journal, 2019
Informal reading inventories (IRI) and curriculum-based measures of reading (CBM-R) have continued importance in instructional planning, but raters have exhibited difficulty in accurately identifying students' miscues. To identify and tabulate scorers' mismarkings, this study employed examiners and raters who scored 15,051 words from 108 passage…
Descriptors: Accuracy, Miscue Analysis, Grade 5, Grade 6
Han, Turgay – International Journal of Progressive Education, 2017
The aim of this study is to examine the variability in and reliability of scores assigned to different quality EFL compositions by EFL instructors and their rating behaviors. Using a mixed research design, quantitative data were collected from EFL instructors' ratings of 30 compositions of three different qualities using a holistic scoring rubric.…
Descriptors: English (Second Language), Writing Evaluation, Scores, Expertise
Li, Hui – English Language Teaching, 2016
The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…
Descriptors: Foreign Countries, Evaluators, Interrater Reliability, Decision Making
Previous Page | Next Page »
Pages: 1 | 2