Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 13 |
Descriptor
Decision Making | 20 |
Evaluators | 20 |
Interrater Reliability | 20 |
Evaluation Methods | 7 |
Comparative Analysis | 6 |
Second Language Learning | 6 |
Foreign Countries | 5 |
Performance Based Assessment | 5 |
Scoring | 5 |
Standards | 5 |
Correlation | 4 |
More ▼ |
Source
Author
Armijo-Olivo, Susan | 1 |
Brull, Harry | 1 |
Byrd, Gary R. | 1 |
Campbell, Sandy | 1 |
Chinn, Roberta N. | 1 |
Coleman, Susan | 1 |
Craig, Rodger | 1 |
Delandshere, Ginette | 1 |
Han, Qie | 1 |
Hertz, Norman R. | 1 |
Jaeger, Richard M. | 1 |
More ▼ |
Publication Type
Journal Articles | 18 |
Reports - Research | 12 |
Reports - Evaluative | 7 |
Speeches/Meeting Papers | 2 |
Information Analyses | 1 |
Opinion Papers | 1 |
Education Level
Higher Education | 4 |
Postsecondary Education | 3 |
Secondary Education | 1 |
Audience
Location
China | 1 |
Europe | 1 |
Norway | 1 |
Ohio | 1 |
Texas | 1 |
Turkey (Istanbul) | 1 |
United Kingdom | 1 |
Vietnam | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022
Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…
Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023
Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…
Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials
Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022
In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…
Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction
Han, Qie – Working Papers in TESOL & Applied Linguistics, 2016
This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine "how" raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms…
Descriptors: Second Language Learning, Student Evaluation, Evaluators, Speech Tests
Price, Keith; Coleman, Susan; Byrd, Gary R. – Administrative Issues Journal: Connecting Education, Practice, and Research, 2014
The study of capital juries remains a subject of critical interest for the public and for legislative and judicial policy makers as well as legal scholars and social scientists. Cowan, Thompson, and Ellsworth established one of the standard methodologies for examination of this topic in their 1984 seminal study by observing the subjects' debate…
Descriptors: Court Litigation, Death, Punishment, Bias
Yalçin-Çolakoglu, Özlem; Selçuk, Merve – Advances in Language and Literary Studies, 2019
Criterion referenced tests of second language speaking performance are administered in different institutions using different procedures. The present study reports raters' practices of second language speaking tests, in particular the correspondence between test-takers' grades when assessed individually and in groups. Data derived from…
Descriptors: Oral Language, Language Tests, Test Validity, Inferences
Jølle, Lennart – Assessment in Education: Principles, Policy & Practice, 2015
Novice members of a Norwegian national rater panel tasked with assessing Year 8 pupils' written texts were studied during three successive preparation sessions (2011-2012). The purpose was to investigate how the raters successfully make use of different decision-making strategies in an assessment situation where pre-set criteria and standards give…
Descriptors: Interrater Reliability, Writing Evaluation, Decision Making, Novices
Wood, Timothy J. – Advances in Health Sciences Education, 2014
Medical education relies heavily on assessment formats that require raters to assess the competence and skills of learners. Unfortunately, there are often inconsistencies and variability in the scores raters assign. To ensure the scores from these assessment tools have validity, it is important to understand the underlying cognitive processes that…
Descriptors: Medical Education, Interrater Reliability, Cognitive Processes, Validity
Li, Hui – English Language Teaching, 2016
The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…
Descriptors: Foreign Countries, Evaluators, Interrater Reliability, Decision Making
Jarvis, Scott – Language Testing, 2017
The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…
Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers
Hertz, Norman R.; Chinn, Roberta N. – 2002
Nearly all of the research on standard setting focuses on different standard setting methods rather than the interaction of group members and the instructions given to group members. This study explored the effect of deliberation style and the requirement to reach consensus on the passing score, on rater satisfaction, and on postdecision…
Descriptors: Decision Making, Evaluation Methods, Evaluators, Interaction

Lunz, Mary E.; And Others – Educational and Psychological Measurement, 1994
In a study involving eight judges, analysis with the FACETS model provides evidence that judges grade differently, whether or not scores correlate well. This outcome suggests that adjustments for differences among judges should be made before student measures are estimated to produce reproducible decisions. (SLD)
Descriptors: Correlation, Decision Making, Evaluation Methods, Evaluators
Previous Page | Next Page »
Pages: 1 | 2