Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 13 |
| Since 2017 (last 10 years) | 24 |
| Since 2007 (last 20 years) | 39 |
Descriptor
| Comparative Analysis | 56 |
| Decision Making | 56 |
| Reliability | 26 |
| Test Reliability | 18 |
| Evaluation Methods | 16 |
| Interrater Reliability | 14 |
| Foreign Countries | 12 |
| Scores | 12 |
| Higher Education | 11 |
| Evaluators | 10 |
| Correlation | 9 |
| More ▼ | |
Source
Author
| Chambers, Lucy | 2 |
| Haladyna, Tom | 2 |
| Schultz, Douglas G. | 2 |
| Allen, Abigail | 1 |
| Ameel, Eef | 1 |
| Arends, Lidia R. | 1 |
| Armijo-Olivo, Susan | 1 |
| Bao, Lei | 1 |
| Blitz, Mark | 1 |
| Boen, Filip | 1 |
| Bolhuis, Erik | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 10 |
| Postsecondary Education | 7 |
| Secondary Education | 6 |
| Elementary Secondary Education | 3 |
| Grade 11 | 2 |
| Grade 12 | 2 |
| Elementary Education | 1 |
| Grade 10 | 1 |
| High Schools | 1 |
| Middle Schools | 1 |
Audience
| Researchers | 1 |
| Teachers | 1 |
Location
| Belgium | 3 |
| Turkey | 3 |
| Netherlands | 2 |
| Ohio | 2 |
| United Kingdom | 2 |
| United Kingdom (England) | 2 |
| Asia | 1 |
| Australia | 1 |
| Austria | 1 |
| Brazil | 1 |
| California | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 1 |
| Race to the Top | 1 |
Assessments and Surveys
| Kaufman Assessment Battery… | 1 |
| Parental Authority… | 1 |
| Peabody Picture Vocabulary… | 1 |
| Self Directed Search | 1 |
| Wechsler Individual… | 1 |
| Woodcock Johnson Tests of… | 1 |
What Works Clearinghouse Rating
Lucy Chambers; Emma Walland; Jo Ireland – Research Matters, 2024
Comparative Judgement (CJ) is traditionally and primarily used to compare written texts. In this study we explored whether we could extend its use to comparing audio files. We used GCSE Music portfolios which contained a mix of audio recordings, musical scores and text documents. Fifteen judges completed two exercises: one comparing musical…
Descriptors: Evaluative Thinking, Judges, Comparative Analysis, Reliability
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022
Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…
Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators
Karel Kok; Sophia Chroszczinsky; Burkhard Priemer – Physical Review Physics Education Research, 2024
Data comparison problems are used in teaching and science education research that focuses on students' ability to compare datasets and their conceptual understanding of measurement uncertainties. However, the evaluation of students' decisions in these problems can pose a problem: e.g., students making a correct decision for the wrong reasons.…
Descriptors: Secondary School Students, Undergraduate Students, Comparative Analysis, Evaluation Methods
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Leech, Tony; Chambers, Lucy – Research Matters, 2022
Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…
Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Yocarini, Iris E.; Bouwmeester, Samantha; Smeets, Guus; Arends, Lidia R. – Educational Measurement: Issues and Practice, 2018
This real-data-guided simulation study systematically evaluated the decision accuracy of complex decision rules combining multiple tests within different realistic curricula. Specifically, complex decision rules combining conjunctive aspects and compensatory aspects were evaluated. A conjunctive aspect requires a minimum level of performance,…
Descriptors: Comparative Analysis, Decision Making, Accuracy, Higher Education
Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023
Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…
Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication
Chiu, Yi-Fang; Neel, Amy; Loux, Travis – Journal of Speech, Language, and Hearing Research, 2021
Purpose: Auditory perceptual judgments are commonly used to diagnose dysarthria and assess treatment progress. The purpose of the study was to examine the acoustic underpinnings of perceptual speech abnormalities in individuals with Parkinson's disease (PD). Method: Auditory perceptual judgments were obtained from sentences produced by 13 speakers…
Descriptors: Diseases, Articulation (Speech), Speech Communication, Acoustics
Vidal Rodeiro, Carmen; Chambers, Lucy – Research Matters, 2022
Many high-stakes qualifications include non-exam assessments that are marked by teachers. Awarding bodies then apply a moderation process to bring the marking of these assessments to an agreed standard. Comparative Judgement (CJ) is a technique where two (or more) pieces of work are compared at a time, allowing an overall rank order of work to be…
Descriptors: Evaluation Methods, Portfolios (Background Materials), Decision Making, Task Analysis
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

Peer reviewed
Direct link
