ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	22
Since 2017 (last 10 years)	39
Since 2007 (last 20 years)	109

Descriptor

Comparative Analysis	212
Evaluation Methods	212
Reliability	108
Test Reliability	58
Validity	56
Interrater Reliability	53
Foreign Countries	52
Student Evaluation	42
Higher Education	39
Test Validity	35
Correlation	27
Evaluation Criteria	26
Evaluators	26
Statistical Analysis	21
Psychometrics	20
College Students	19
Computer Software	19
Scores	19
Measurement Techniques	18
Models	18
Peer Evaluation	17
Rating Scales	17
Data Analysis	16
Decision Making	16
Computer Assisted Testing	14
More ▼

Publication Type

Journal Articles	149
Reports - Research	115
Reports - Evaluative	49
Speeches/Meeting Papers	19
Reports - Descriptive	16
Dissertations/Theses -…	9
Information Analyses	9
Tests/Questionnaires	9
Opinion Papers	8
Numerical/Quantitative Data	4
Collected Works - Proceedings	2
Book/Product Reviews	1
Guides - Classroom - Teacher	1
Guides - Non-Classroom	1
More ▼

Education Level

Higher Education	45
Postsecondary Education	30
Elementary Secondary Education	14
Elementary Education	13
Secondary Education	13
Early Childhood Education	7
Adult Education	5
High Schools	5
Middle Schools	5
Primary Education	4
Grade 2	3
Kindergarten	3
Preschool Education	3
Grade 1	2
Grade 3	2
Grade 4	2
Junior High Schools	2
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Intermediate Grades	1
More ▼

Audience

Practitioners	6
Teachers	4
Administrators	3
Researchers	3
Policymakers	2

Location

Australia	9
United Kingdom (England)	9
United Kingdom	7
China	6
United States	6
Netherlands	4
Connecticut	3
Portugal	3
Turkey	3
Asia	2
Belgium	2
Florida	2
Germany	2
Hong Kong	2
Iran	2
New Hampshire	2
New York	2
New Zealand	2
Rhode Island	2
Singapore	2
Spain	2
Vermont	2
Austria	1
Brazil	1
Canada	1
More ▼

Laws, Policies, & Programs

Every Student Succeeds Act…

What Works Clearinghouse Rating

Showing 1 to 15 of 212 results Save | Export

Moderation of Non-Exam Assessments: A Novel Approach Using Comparative Judgement

Peer reviewed

Direct link

Lucy Chambers; Sylvia Vitello; Carmen Vidal Rodeiro – Assessment in Education: Principles, Policy & Practice, 2024

In England, some secondary-level qualifications comprise non-exam assessments which need to undergo moderation before grading. Currently, moderation is conducted at centre (school) level. This raises challenges for maintaining the standard across centres. Recent technological advances enable novel moderation methods that are no longer bound by…

Descriptors: Foreign Countries, Evaluation Methods, Comparative Analysis, Grading

Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Peer reviewed

Direct link

Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…

Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Improving Reliability in Assessing Integrative Learning Using Rubrics: Does Group Norming Help?

Peer reviewed
PDF on ERIC

Download full text

Lanah Stafford; Erin Cousins; Linda Bol; Megan Mize – Research & Practice in Assessment, 2023

Integrative learning is an important outcome for graduates of higher education. Therefore, it should be well-defined and assessed reliably. The American Association of Colleges & Universities has developed a rubric to define and assess integrative learning, but it has low reliability. This pilot study examines whether this rubric's reliability…

Descriptors: Scoring Rubrics, Reliability, Evaluation Methods, Faculty Development

Simulating the Relationship between Nonword Repetition Performance and Vocabulary Growth in 2-Year-Olds: Evidence from the Language 0-5 Project

Peer reviewed

Direct link

Caroline F. Rowland; Amy Bidgood; Gary Jones; Andrew Jessop; Paula Stinson; Julian M. Pine; Samantha Durrant; Michelle S. Peter – Language Learning, 2025

A strong predictor of children's language is performance on non-word repetition (NWR) tasks. However, the basis of this relationship remains unknown. Some suggest that NWR tasks measure phonological working memory, which then affects language growth. Others argue that children's knowledge of language/language experience affects NWR performance. A…

Descriptors: Vocabulary Development, Comparative Analysis, Computational Linguistics, Language Skills

Students' Comparison Competencies in Geography: Results from an Explorative Assessment Study

Peer reviewed

Direct link

Marine Simon; Alexandra Budke – Journal of Geography in Higher Education, 2024

Comparison is an important geographic method and a common task in geography education. Mastering comparison is a complex competency and written comparisons are challenging tasks both for students and assessors. As yet, however, there is no set test for evaluating comparison competency nor tool for enhancing it. Moreover, little is known about…

Descriptors: Geography Instruction, Student Evaluation, Comparative Analysis, Reliability

How to Evaluate Students' Decisions in a Data Comparison Problem: Correct Decision for the Wrong Reasons?

Peer reviewed

Direct link

Karel Kok; Sophia Chroszczinsky; Burkhard Priemer – Physical Review Physics Education Research, 2024

Data comparison problems are used in teaching and science education research that focuses on students' ability to compare datasets and their conceptual understanding of measurement uncertainties. However, the evaluation of students' decisions in these problems can pose a problem: e.g., students making a correct decision for the wrong reasons.…

Descriptors: Secondary School Students, Undergraduate Students, Comparative Analysis, Evaluation Methods

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Do You Mean What I Mean? Comparing Teacher Performance Self-Scores and Evaluator-Generated Scores

Peer reviewed

Direct link

Hunter, Seth B. – Journal of Education Human Resources, 2023

Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability

Evaluating Large Language Models in Analysing Classroom Dialogue

Peer reviewed

Direct link

Yun Long; Haifeng Luo; Yu Zhang – npj Science of Learning, 2024

This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue--a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using…

Descriptors: Classroom Communication, Computational Linguistics, Chinese, Mathematics Instruction

The Concurrent Validity of Comparative Judgement Outcomes Compared with Marks

Download full text

Gill, Tim – Research Matters, 2022

In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from…

Descriptors: Comparative Analysis, Decision Making, Scripts, Standards

The Effect of Adaptivity on the Reliability Coefficient in Adaptive Comparative Judgement

Peer reviewed

Direct link

Bramley, Tom; Vitello, Sylvia – Assessment in Education: Principles, Policy & Practice, 2019

Comparative Judgement (CJ) is an increasingly widely investigated method in assessment for creating a scale, for example of the quality of essays. One area that has attracted attention in CJ studies is the optimisation of the selection of pairs of objects for judgement. One approach is known as adaptive comparative judgement (ACJ). It has been…

Descriptors: Reliability, Evaluation Methods, Comparative Analysis, Essay Tests

Judges' Views on Pairwise Comparative Judgement and Rank Ordering as Alternatives to Analytical Essay Marking

Download full text

Walland, Emma – Research Matters, 2022

In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…

Descriptors: Essays, Grading, Writing Evaluation, Evaluators

How Do Judges in Comparative Judgement Exercises Make Their Judgements?

Download full text

Leech, Tony; Chambers, Lucy – Research Matters, 2022

Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…

Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 15

ProQuest LLC	9
Assessment & Evaluation in…	6
Journal of Speech, Language,…	4
Research Matters	4
Academic Medicine	3
Assessment in Education:…	3
Educational Research	3
Educational and Psychological…	3
Research Synthesis Methods	3
Social Indicators Research	3
American Journal on Mental…	2
Assessment and Evaluation in…	2
International Journal of…	2
International Journal of…	2
Journal of Communication…	2
Journal of Higher Education	2
Language Learning	2
Psychological Assessment	2
Quality Assurance in…	2
Teaching in Higher Education	2
Western Journal of Speech…	2
ALT-J: Research in Learning…	1
Advances in Health Sciences…	1
Advances in Language and…	1
American Journal of Evaluation	1
More ▼

Chambers, Lucy	2
Darling-Hammond, Linda	2
Francis, David J.	2
Humphry, Stephen M.	2
Jaeger, Richard M.	2
Mott, Michael S.	2
Myford, Carol M.	2
Schultz, Douglas G.	2
ANDERSON, JAMES A.	1
Abdullah, Firdaus	1
Abu-Hamour, Bashir	1
Akbari, Alireza	1
Alemi, Minoo	1
Alexandra Budke	1
Alfonso, Vincent C.	1
Allam, Reynald	1
Alsree, Zubaida	1
Amoriell, William J.	1
Amy Bidgood	1
Anderson, Ronald E.	1
Andrew Jessop	1
Ansari, Saied	1
Apple, Kristen	1
Aranya, Nissim	1
More ▼

National Assessment of…	2
New York State Regents…	2
Beck Anxiety Inventory	1
Center for Epidemiologic…	1
Clinical Evaluation of…	1
College Student Experiences…	1
Dale Chall Readability Formula	1
Early Childhood Environment…	1
Flesch Kincaid Grade Level…	1
Flesch Reading Ease Formula	1
Fry Readability Formula	1
Group Assessment of Logical…	1
Group Embedded Figures Test	1
Learning Style Inventory	1
Medical College Admission Test	1
Myers Briggs Type Indicator	1
NEO Personality Inventory	1
Personality Assessment…	1
Preschool Language Scale	1
Productivity Environmental…	1
Questionnaire on Resources…	1
Raven Progressive Matrices	1
Rosenberg Self Esteem Scale	1
Self Directed Learning…	1
Self Directed Search	1
More ▼