ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	26
Since 2016 (last 10 years)	95
Since 2006 (last 20 years)	236

Descriptor

Comparative Analysis	327
Interrater Reliability	327
Foreign Countries	84
Correlation	65
Evaluation Methods	53
Statistical Analysis	53
Evaluators	47
Scores	44
Second Language Learning	42
Scoring	41
Student Evaluation	41
English (Second Language)	39
Higher Education	34
Teaching Methods	34
Validity	32
Language Tests	31
Writing Evaluation	31
Second Language Instruction	30
College Students	29
Measures (Individuals)	29
Rating Scales	29
Reliability	27
Elementary School Students	25
Evaluation Criteria	24
Interviews	24
More ▼

Publication Type

Journal Articles	262
Reports - Research	248
Reports - Evaluative	53
Speeches/Meeting Papers	35
Tests/Questionnaires	23
Information Analyses	11
Dissertations/Theses -…	10
Reports - Descriptive	8
Numerical/Quantitative Data	4
Book/Product Reviews	1
Collected Works - Proceedings	1
Collected Works - Serials	1
Guides - Non-Classroom	1
Opinion Papers	1
More ▼

Education Level

Higher Education	77
Postsecondary Education	64
Elementary Education	28
Secondary Education	27
Elementary Secondary Education	17
High Schools	11
Middle Schools	8
Adult Education	6
Early Childhood Education	6
Grade 4	6
Grade 1	5
Preschool Education	5
Grade 2	4
Grade 3	4
Grade 5	4
Intermediate Grades	4
Junior High Schools	4
Grade 11	3
Grade 6	3
Grade 8	3
Grade 10	2
Grade 7	2
Kindergarten	2
Primary Education	2
Grade 12	1
More ▼

Audience

Practitioners	4
Researchers	4
Teachers	2

Location

China	8
Netherlands	7
United Kingdom	7
Australia	6
Turkey	6
United States	6
Florida	5
Iran	5
Taiwan	5
United Kingdom (England)	5
Washington	5
Germany	4
Greece	4
Pennsylvania	4
Arizona	3
Belgium	3
California	3
Canada	3
Finland	3
Georgia	3
Philippines	3
Saudi Arabia	3
Singapore	3
Sweden	3
Tennessee	3
More ▼

Laws, Policies, & Programs

Improving Americas Schools…	1
Individuals with Disabilities…	1
No Child Left Behind Act 2001	1
Temporary Assistance for…	1

What Works Clearinghouse Rating

Does not meet standards

Showing 1 to 15 of 327 results Save | Export

The Effect of Visual Sort and Rate versus Visual Analog Scales on the Reliability of Judgments of Dysphonia

Peer reviewed

Direct link

Kapsner-Smith, Mara R.; Opuszynski, Amanda; Stepp, Cara E.; Eadie, Tanya L. – Journal of Speech, Language, and Hearing Research, 2021

Purpose: The reliability of auditory-perceptual judgments between listeners is a long-standing problem in the assessment of voice disorders. The purpose of this study was to determine whether a relatively novel experimental scaling method, called visual sort and rate (VSR), yielded stronger reliability than the more frequently used method of…

Descriptors: Voice Disorders, Interrater Reliability, Rating Scales, Severity (of Disability)

Agreement between Visual Inspection and Objective Analysis Methods: A Replication and Extension

Peer reviewed

Direct link

Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022

Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…

Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

Do You Mean What I Mean? Comparing Teacher Performance Self-Scores and Evaluator-Generated Scores

Peer reviewed

Direct link

Hunter, Seth B. – Journal of Education Human Resources, 2023

Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability

Continuous Improvement of Inter-Rater Reliability in Transition Compliance at a State Agency

Direct link

Heather Raithel – ProQuest LLC, 2023

A mixed methods action research study was designed to answer three research questions based on inter-rater reliability (IRR) in compliance calls for transition at a state education agency, perceived confidence levels in making and discussing compliance calls, and perceived confidence in sharing transition resources. An innovation based on…

Descriptors: Public Agencies, Interrater Reliability, Compliance (Legal), Comparative Analysis

Reliability of the Reflective Learning Framework for Assessing Higher-Order Thinking in Geography and Sustainability Courses

Peer reviewed

Direct link

Whalen, Kate; Paez, Antonio – Journal of Geography, 2022

Experiential education partnered with guided reflection is thought to support students with higher-order thinking skills. In this study, 44 reflections from two university-level sustainability courses were compared. In both courses students were asked to write a reflection, but only one course used the Reflective Learning Framework (RLF). Tests of…

Descriptors: Geography Instruction, Thinking Skills, Experiential Learning, Sustainability

An Interview with ChatGPT on Emergency Remote Teaching: A Comparative Analysis Based on Human-AI Collaboration

Peer reviewed
PDF on ERIC

Download full text

Tülübas, Tijen; Demirkol, Murat; Ozdemir, Tuncay Yavuz; Polat, Hakan; Karakose, Turgut; Yirci, Ramazan – Educational Process: International Journal, 2023

Background/purpose: ChatGPT, a recent form of AI-based language model, have garnered interest among people from diverse backgrounds with its immersive capabilities. Using ChatGPT to support or generate scientific research has also created an ongoing debate over its advantages versus risks. The present study aimed to conduct an AI-enabled research…

Descriptors: Artificial Intelligence, Emergency Programs, Distance Education, COVID-19

Kappa Coefficients for Missing Data

Peer reviewed

Direct link

De Raadt, Alexandra; Warrens, Matthijs J.; Bosker, Roel J.; Kiers, Henk A. L. – Educational and Psychological Measurement, 2019

Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data…

Descriptors: Interrater Reliability, Data, Statistical Analysis, Statistical Bias

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

Comparison of Inter-Rater Reliability Techniques in Performance-Based Assessment

Peer reviewed
PDF on ERIC

Download full text

Arslan Mancar, Sinem; Gulleroglu, H. Deniz – International Journal of Assessment Tools in Education, 2022

The aim of this study is to analyse the importance of the number of raters and compare the results obtained by techniques based on Classical Test Theory (CTT) and Generalizability (G) Theory. The Kappa and Krippendorff alpha techniques based on CTT were used to determine the inter-rater reliability. In this descriptive research data consists of…

Descriptors: Comparative Analysis, Interrater Reliability, Advanced Placement, Scoring Rubrics

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 22

Journal of Speech, Language,…	11
ProQuest LLC	10
Language Testing	6
Journal of Autism and…	5
Assessment & Evaluation in…	4
Educational and Psychological…	4
English Language Teaching	4
Language Assessment Quarterly	4
Advances in Health Sciences…	3
Behavior Modification	3
Creativity Research Journal	3
ETS Research Report Series	3
Educational Sciences: Theory…	3
Journal of Applied Behavior…	3
Online Submission	3
Research Synthesis Methods	3
Academic Medicine	2
American Journal of…	2
Applied Measurement in…	2
Assessing Writing	2
Autism: The International…	2
Clinical Linguistics &…	2
Developmental Psychology	2
Early Child Development and…	2
Education and Training in…	2
More ▼

Coniam, David	3
Lunz, Mary E.	3
Attali, Yigal	2
Beach, Kristen D.	2
Bocian, Kathleen M.	2
Bothe, Anne K.	2
Chavez, Oscar	2
Derby, K. Mark	2
Gillan, Nicola	2
Grouws, Douglas A.	2
Hestenes, Linda L.	2
Incikabi, Lutfi	2
Jones, Ian	2
Kokkinaki, Theano	2
McLaughlin, T. F.	2
Mims, Sharon U.	2
Myford, Carol M.	2
Nakamura, Yuji	2
O'Connor, Rollanda E.	2
O'Neill, Thomas R.	2
Papick, Ira	2
Wind, Stefanie A.	2
Zayac, Ryan M.	2
Abbott, Robert	1
More ▼

Test of English as a Foreign…	5
Autism Diagnostic Observation…	4
Woodcock Johnson Tests of…	4
Dynamic Indicators of Basic…	3
Early Childhood Environment…	2
National Assessment of…	2
Peabody Picture Vocabulary…	2
ACT Assessment	1
Adaptive Behavior Scale	1
Expressive One Word Picture…	1
Georgia Criterion Referenced…	1
Graduate Management Admission…	1
Kaufman Brief Intelligence…	1
MacArthur Bates Communicative…	1
Mean Length of Utterance	1
Multifactor Leadership…	1
NEO Personality Inventory	1
Neale Analysis of Reading…	1
Obsessive Compulsive Scale	1
Pediatric Evaluation of…	1
Praxis Series	1
Raven Progressive Matrices	1
SAT (College Admission Test)	1
Vineland Adaptive Behavior…	1
Wechsler Adult Intelligence…	1
More ▼