ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	13

Descriptor

Decision Making	20
Evaluators	20
Interrater Reliability	20
Evaluation Methods	7
Comparative Analysis	6
Second Language Learning	6
Foreign Countries	5
Performance Based Assessment	5
Scoring	5
Standards	5
Correlation	4
English (Second Language)	4
Evaluation Criteria	4
Second Language Instruction	4
Writing Evaluation	4
Language Proficiency	3
Language Tests	3
Scores	3
Simulation	3
Speech Communication	3
Standard Setting (Scoring)	3
Accuracy	2
Artificial Intelligence	2
Bias	2
College Students	2
More ▼

Source

Educational Measurement:…	2
Administrative Issues…	1
Advances in Health Sciences…	1
Advances in Language and…	1
Applied Measurement in…	1
Assessment in Education:…	1
Educational Researcher	1
Educational and Psychological…	1
English Language Teaching	1
Journal of Applied Behavior…	1
Journal of Educational and…	1
Language Education &…	1
Language Testing	1
Reading & Writing Quarterly	1
Research Synthesis Methods	1
Studies in Second Language…	1
Working Papers in TESOL &…	1
More ▼

Publication Type

Journal Articles	18
Reports - Research	12
Reports - Evaluative	7
Speeches/Meeting Papers	2
Information Analyses	1
Opinion Papers	1

Education Level

Higher Education	4
Postsecondary Education	3
Secondary Education	1

Audience

Location

China	1
Europe	1
Norway	1
Ohio	1
Texas	1
Turkey (Istanbul)	1
United Kingdom	1
Vietnam	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

Agreement between Visual Inspection and Objective Analysis Methods: A Replication and Extension

Peer reviewed

Direct link

Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022

Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…

Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

The Processes of Rating L2 Speaking Performance Using an Analytic Rating Scale -- A Qualitative Exploration

Peer reviewed
PDF on ERIC

Download full text

Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022

In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…

Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction

Rater Cognition in L2 Speaking Assessment: A Review of the Literature

Peer reviewed
PDF on ERIC

Download full text

Han, Qie – Working Papers in TESOL & Applied Linguistics, 2016

This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine "how" raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms…

Descriptors: Second Language Learning, Student Evaluation, Evaluators, Speech Tests

Examination of Capital Murder Jurors' Deliberations: Methods and Issues

Peer reviewed
PDF on ERIC

Download full text

Price, Keith; Coleman, Susan; Byrd, Gary R. – Administrative Issues Journal: Connecting Education, Practice, and Research, 2014

The study of capital juries remains a subject of critical interest for the public and for legislative and judicial policy makers as well as legal scholars and social scientists. Cowan, Thompson, and Ellsworth established one of the standard methodologies for examination of this topic in their 1984 seminal study by observing the subjects' debate…

Descriptors: Court Litigation, Death, Punishment, Bias

Assessing Individual and Group Oral Exams: Scoring Criteria and Rater Interaction

Peer reviewed
PDF on ERIC

Download full text

Yalçin-Çolakoglu, Özlem; Selçuk, Merve – Advances in Language and Literary Studies, 2019

Criterion referenced tests of second language speaking performance are administered in different institutions using different procedures. The present study reports raters' practices of second language speaking tests, in particular the correspondence between test-takers' grades when assessed individually and in groups. Data derived from…

Descriptors: Oral Language, Language Tests, Test Validity, Inferences

Rater Strategies for Reaching Agreement on Pupil Text Quality

Peer reviewed

Direct link

Jølle, Lennart – Assessment in Education: Principles, Policy & Practice, 2015

Novice members of a Norwegian national rater panel tasked with assessing Year 8 pupils' written texts were studied during three successive preparation sessions (2011-2012). The purpose was to investigate how the raters successfully make use of different decision-making strategies in an assessment situation where pre-set criteria and standards give…

Descriptors: Interrater Reliability, Writing Evaluation, Decision Making, Novices

Exploring the Role of First Impressions in Rater-Based Assessments

Peer reviewed

Direct link

Wood, Timothy J. – Advances in Health Sciences Education, 2014

Medical education relies heavily on assessment formats that require raters to assess the competence and skills of learners. Unfortunately, there are often inconsistencies and variability in the scores raters assign. To ensure the scores from these assessment tools have validity, it is important to understand the underlying cognitive processes that…

Descriptors: Medical Education, Interrater Reliability, Cognitive Processes, Validity

How Do Raters Judge Spoken Vocabulary?

Peer reviewed
PDF on ERIC

Download full text

Li, Hui – English Language Teaching, 2016

The aim of the study was to investigate how raters come to their decisions when judging spoken vocabulary. Segmental rating was introduced to quantify raters' decision-making process. It is hoped that this simulated study brings fresh insight to future methodological considerations with spoken data. Twenty trainee raters assessed five Chinese…

Descriptors: Foreign Countries, Evaluators, Interrater Reliability, Decision Making

Grounding Lexical Diversity in Human Judgments

Peer reviewed

Direct link

Jarvis, Scott – Language Testing, 2017

The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…

Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers

The Role of Deliberation Style in Standard Setting for Licensing and Certification Examinations.

Download full text

Hertz, Norman R.; Chinn, Roberta N. – 2002

Nearly all of the research on standard setting focuses on different standard setting methods rather than the interaction of group members and the instructions given to group members. This study explored the effect of deliberation style and the requirement to reach consensus on the passing score, on rater satisfaction, and on postdecision…

Descriptors: Decision Making, Evaluation Methods, Evaluators, Interaction

Interjudge Reliability and Decision Reproducibility.

Peer reviewed

Lunz, Mary E.; And Others – Educational and Psychological Measurement, 1994

In a study involving eight judges, analysis with the FACETS model provides evidence that judges grade differently, whether or not scores correlate well. This outcome suggests that adjustments for differences among judges should be made before student measures are estimated to produce reproducible decisions. (SLD)

Descriptors: Correlation, Decision Making, Evaluation Methods, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Armijo-Olivo, Susan	1
Brull, Harry	1
Byrd, Gary R.	1
Campbell, Sandy	1
Chinn, Roberta N.	1
Coleman, Susan	1
Craig, Rodger	1
Delandshere, Ginette	1
Han, Qie	1
Hertz, Norman R.	1
Jaeger, Richard M.	1
Jarvis, Scott	1
Jiehui Hu	1
Jølle, Lennart	1
Kachlicka, Magdalena	1
Kaiser, Paul D.	1
Kunihara, Takuya	1
Lanovaz, Marc J.	1
Li, Hui	1
Lian Li	1
Longford, Nicholas T.	1
Lunz, Mary E.	1
Macmillan, Konstantinos	1
Mills, Craig N.	1
More ▼