ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	13
Since 2017 (last 10 years)	24
Since 2007 (last 20 years)	39

Descriptor

Comparative Analysis	56
Decision Making	56
Reliability	26
Test Reliability	18
Evaluation Methods	16
Interrater Reliability	14
Foreign Countries	12
Scores	12
Higher Education	11
Evaluators	10
Correlation	9
Student Evaluation	9
Second Language Learning	8
Task Analysis	8
Computer Software	7
Evaluation Criteria	7
Validity	7
Accuracy	6
Second Language Instruction	6
Secondary School Students	6
Teaching Methods	6
Test Validity	6
Writing Evaluation	6
Classification	5
English (Second Language)	5
More ▼

Publication Type

Reports - Research	40
Journal Articles	37
Reports - Descriptive	4
Speeches/Meeting Papers	4
Information Analyses	3
Dissertations/Theses -…	2
Reports - Evaluative	2
Tests/Questionnaires	2
Books	1
Collected Works - General	1
Collected Works - Proceedings	1
More ▼

Education Level

Higher Education	10
Postsecondary Education	7
Secondary Education	6
Elementary Secondary Education	3
Grade 11	2
Grade 12	2
Elementary Education	1
Grade 10	1
High Schools	1
Middle Schools	1

Audience

Researchers	1
Teachers	1

Location

Belgium	3
Turkey	3
Netherlands	2
Ohio	2
United Kingdom	2
United Kingdom (England)	2
Asia	1
Australia	1
Austria	1
Brazil	1
California	1
China	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Ireland	1
Israel	1
Italy	1
Japan	1
Kazakhstan	1
More ▼

Laws, Policies, & Programs

Individuals with Disabilities…	1
Race to the Top	1

Assessments and Surveys

Kaufman Assessment Battery…	1
Parental Authority…	1
Peabody Picture Vocabulary…	1
Self Directed Search	1
Wechsler Individual…	1
Woodcock Johnson Tests of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 56 results Save | Export

Comparing Music Recordings Using Pairwise Comparative Judgement: Exploring the Judge Experience

Download full text

Lucy Chambers; Emma Walland; Jo Ireland – Research Matters, 2024

Comparative Judgement (CJ) is traditionally and primarily used to compare written texts. In this study we explored whether we could extend its use to comparing audio files. We used GCSE Music portfolios which contained a mix of audio recordings, musical scores and text documents. Fifteen judges completed two exercises: one comparing musical…

Descriptors: Evaluative Thinking, Judges, Comparative Analysis, Reliability

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Agreement between Visual Inspection and Objective Analysis Methods: A Replication and Extension

Peer reviewed

Direct link

Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022

Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…

Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators

How to Evaluate Students' Decisions in a Data Comparison Problem: Correct Decision for the Wrong Reasons?

Peer reviewed

Direct link

Karel Kok; Sophia Chroszczinsky; Burkhard Priemer – Physical Review Physics Education Research, 2024

Data comparison problems are used in teaching and science education research that focuses on students' ability to compare datasets and their conceptual understanding of measurement uncertainties. However, the evaluation of students' decisions in these problems can pose a problem: e.g., students making a correct decision for the wrong reasons.…

Descriptors: Secondary School Students, Undergraduate Students, Comparative Analysis, Evaluation Methods

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

The Concurrent Validity of Comparative Judgement Outcomes Compared with Marks

Download full text

Gill, Tim – Research Matters, 2022

In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…

Descriptors: Comparative Analysis, Decision Making, Scripts, Standards

How Do Judges in Comparative Judgement Exercises Make Their Judgements?

Download full text

Leech, Tony; Chambers, Lucy – Research Matters, 2022

Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…

Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Systematic Comparison of Decision Accuracy of Complex Compensatory Decision Rules Combining Multiple Tests in a Higher Education Context

Peer reviewed

Direct link

Yocarini, Iris E.; Bouwmeester, Samantha; Smeets, Guus; Arends, Lidia R. – Educational Measurement: Issues and Practice, 2018

This real-data-guided simulation study systematically evaluated the decision accuracy of complex decision rules combining multiple tests within different realistic curricula. Specifically, complex decision rules combining conjunctive aspects and compensatory aspects were evaluated. A conjunctive aspect requires a minimum level of performance,…

Descriptors: Comparative Analysis, Decision Making, Accuracy, Higher Education

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

Exploring the Acoustic Perceptual Relationship of Speech in Parkinson's Disease

Peer reviewed

Direct link

Chiu, Yi-Fang; Neel, Amy; Loux, Travis – Journal of Speech, Language, and Hearing Research, 2021

Purpose: Auditory perceptual judgments are commonly used to diagnose dysarthria and assess treatment progress. The purpose of the study was to examine the acoustic underpinnings of perceptual speech abnormalities in individuals with Parkinson's disease (PD). Method: Auditory perceptual judgments were obtained from sentences produced by 13 speakers…

Descriptors: Diseases, Articulation (Speech), Speech Communication, Acoustics

Moderation of Non-Exam Assessments: Is Comparative Judgement a Practical Alternative?

Download full text

Vidal Rodeiro, Carmen; Chambers, Lucy – Research Matters, 2022

Many high-stakes qualifications include non-exam assessments that are marked by teachers. Awarding bodies then apply a moderation process to bring the marking of these assessments to an agreed standard. Comparative Judgement (CJ) is a technique where two (or more) pieces of work are compared at a time, allowing an overall rank order of work to be…

Descriptors: Evaluation Methods, Portfolios (Background Materials), Decision Making, Task Analysis

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Research Matters	4
Educational Measurement:…	2
Journal of Speech, Language,…	2
Physical Review Physics…	2
ProQuest LLC	2
Anatomical Sciences Education	1
Applied Psychological…	1
Assessment for Effective…	1
Child Welfare	1
Developmental Psychology	1
ELT Journal	1
ETS Research Report Series	1
Education and Training in…	1
Educational Research and…	1
Educational Sciences: Theory…	1
Grantee Submission	1
International Association for…	1
International Educational…	1
Journal of Applied Behavior…	1
Journal of Career Assessment	1
Journal of Child Nutrition &…	1
Journal of Education and…	1
Journal of Educational…	1
Journal of Experimental…	1
Journal of Teaching in…	1
More ▼

Chambers, Lucy	2
Haladyna, Tom	2
Schultz, Douglas G.	2
Allen, Abigail	1
Ameel, Eef	1
Arends, Lidia R.	1
Armijo-Olivo, Susan	1
Bao, Lei	1
Blitz, Mark	1
Boen, Filip	1
Bolhuis, Erik	1
Botarleanu, Robert-Mihai	1
Bouwmeester, Samantha	1
Brennan, Patricia B. M.	1
Buckley, Katie Hills	1
Burkhard Priemer	1
Burkhardt, Joanna	1
Campbell, Sandy	1
Ceux, Tanja	1
Chen, Cheng	1
Chiu, Yi-Fang	1
Claessens, Manu	1
Cochrane, Aaron	1
Craig, Rodger	1
More ▼