Publication Date
In 2025 | 0 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 22 |
Since 2016 (last 10 years) | 35 |
Since 2006 (last 20 years) | 46 |
Descriptor
Comparative Analysis | 71 |
Evaluation Methods | 71 |
Evaluators | 71 |
Foreign Countries | 15 |
Reliability | 13 |
Computer Software | 12 |
Interrater Reliability | 12 |
Decision Making | 11 |
Validity | 10 |
Scores | 8 |
Student Evaluation | 8 |
More ▼ |
Source
Author
Chambers, Lucy | 2 |
Myford, Carol M. | 2 |
Akbari, Alireza | 1 |
Apple, Kristen | 1 |
Armijo-Olivo, Susan | 1 |
Azzam, Tarek | 1 |
Bamberger, Michael | 1 |
Bartholomew, Scott Ronald | 1 |
Barwell, Fred | 1 |
Blankenship, Mark H. | 1 |
Bosch, Emma | 1 |
More ▼ |
Publication Type
Journal Articles | 55 |
Reports - Research | 49 |
Reports - Evaluative | 18 |
Speeches/Meeting Papers | 15 |
Tests/Questionnaires | 7 |
Reports - Descriptive | 4 |
Information Analyses | 2 |
Collected Works - Proceedings | 1 |
Opinion Papers | 1 |
Education Level
Audience
Researchers | 3 |
Location
United States | 4 |
United Kingdom | 3 |
China | 2 |
Denmark | 2 |
United Kingdom (England) | 2 |
California | 1 |
Canada | 1 |
District of Columbia | 1 |
Germany | 1 |
Iran | 1 |
Italy | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Flesch Kincaid Grade Level… | 1 |
Fry Readability Formula | 1 |
National Adult Literacy… | 1 |
What Works Clearinghouse Rating
Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024
RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…
Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics
Kelly, Kate Tremain; Richardson, Mary; Isaacs, Talia – Assessment in Education: Principles, Policy & Practice, 2022
Comparative judgment is gaining popularity as an assessment tool, including for high-stakes testing purposes, despite relatively little research on the use of the technique. Advocates claim two main rationales for its use: that comparative judgment is valid because humans are better at comparative than absolute judgment, and because it distils the…
Descriptors: Comparative Analysis, Evaluation Methods, Evaluative Thinking, High Stakes Tests
Tucker, Susan; Stevahn, Laurie; King, Jean A. – American Journal of Evaluation, 2023
This article compares the purposes and content of the four foundational documents of the American Evaluation Association (AEA): the Program Evaluation Standards, the AEA Public Statement on Cultural Competence in Evaluation, the AEA Evaluator Competencies, and the AEA Guiding Principles. This reflection on alignment is an early effort in the third…
Descriptors: Professionalism, Comparative Analysis, Professional Associations, Program Evaluation
Kevin C. Haudek; Xiaoming Zhai – International Journal of Artificial Intelligence in Education, 2024
Argumentation, a key scientific practice presented in the "Framework for K-12 Science Education," requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging…
Descriptors: Accuracy, Persuasive Discourse, Artificial Intelligence, Learning Management Systems
Hunter, Seth B. – Journal of Education Human Resources, 2023
Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability
Gill, Tim – Research Matters, 2022
In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…
Descriptors: Comparative Analysis, Decision Making, Scripts, Standards
Reagan Mozer; Luke Miratrix; Jackie Eunjung Relyea; James S. Kim – Journal of Educational and Behavioral Statistics, 2024
In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This…
Descriptors: Scoring, Evaluation Methods, Writing Evaluation, Comparative Analysis
Walland, Emma – Research Matters, 2022
In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…
Descriptors: Essays, Grading, Writing Evaluation, Evaluators
Leech, Tony; Chambers, Lucy – Research Matters, 2022
Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…
Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Nagle, Charlie L.; Trofimovich, Pavel; O'Brien, Mary Grantham; Kennedy, Sara – Modern Language Journal, 2022
Comprehensibility has emerged as a useful and intuitive means of globally evaluating second language (L2) speakers in many research and instructional contexts. In most cases, L2 speakers' comprehensibility is assessed by external listeners who do not engage in extensive communication with the speakers, even though the degree to which a speaker is…
Descriptors: Evaluators, Intelligibility, Pronunciation, Task Analysis
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Vannaprathip, Narumol; Haddawy, Peter; Schultheis, Holger; Suebnukarn, Siriwan – International Journal of Artificial Intelligence in Education, 2022
Virtual reality simulation has had a significant impact on training of psychomotor surgical skills, yet there is still a lack of work on its use to teach surgical decision making. This is particularly noteworthy given the recognized importance of decision making in achieving positive surgical outcomes. With the objective of filling this gap, we…
Descriptors: Intelligent Tutoring Systems, Decision Making, Surgery, Teaching Methods
Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023
Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…
Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials