ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	26
Since 2016 (last 10 years)	95
Since 2006 (last 20 years)	236

Descriptor

Comparative Analysis	327
Interrater Reliability	327
Foreign Countries	84
Correlation	65
Evaluation Methods	53
Statistical Analysis	53
Evaluators	47
Scores	44
Second Language Learning	42
Scoring	41
Student Evaluation	41
English (Second Language)	39
Higher Education	34
Teaching Methods	34
Validity	32
Language Tests	31
Writing Evaluation	31
Second Language Instruction	30
College Students	29
Measures (Individuals)	29
Rating Scales	29
Reliability	27
Elementary School Students	25
Evaluation Criteria	24
Interviews	24
More ▼

Publication Type

Journal Articles	262
Reports - Research	248
Reports - Evaluative	53
Speeches/Meeting Papers	35
Tests/Questionnaires	23
Information Analyses	11
Dissertations/Theses -…	10
Reports - Descriptive	8
Numerical/Quantitative Data	4
Book/Product Reviews	1
Collected Works - Proceedings	1
Collected Works - Serials	1
Guides - Non-Classroom	1
Opinion Papers	1
More ▼

Education Level

Higher Education	77
Postsecondary Education	64
Elementary Education	28
Secondary Education	27
Elementary Secondary Education	17
High Schools	11
Middle Schools	8
Adult Education	6
Early Childhood Education	6
Grade 4	6
Grade 1	5
Preschool Education	5
Grade 2	4
Grade 3	4
Grade 5	4
Intermediate Grades	4
Junior High Schools	4
Grade 11	3
Grade 6	3
Grade 8	3
Grade 10	2
Grade 7	2
Kindergarten	2
Primary Education	2
Grade 12	1
More ▼

Audience

Practitioners	4
Researchers	4
Teachers	2

Location

China	8
Netherlands	7
United Kingdom	7
Australia	6
Turkey	6
United States	6
Florida	5
Iran	5
Taiwan	5
United Kingdom (England)	5
Washington	5
Germany	4
Greece	4
Pennsylvania	4
Arizona	3
Belgium	3
California	3
Canada	3
Finland	3
Georgia	3
Philippines	3
Saudi Arabia	3
Singapore	3
Sweden	3
Tennessee	3
More ▼

Laws, Policies, & Programs

Improving Americas Schools…	1
Individuals with Disabilities…	1
No Child Left Behind Act 2001	1
Temporary Assistance for…	1

What Works Clearinghouse Rating

Does not meet standards

Showing 1 to 15 of 327 results Save | Export

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Reliable Application of the MATH Taxonomy Sheds Light on Assessment Practices

Peer reviewed

Direct link

Kinnear, George; Bennett, Max; Binnie, Rachel; Bolt, Róisín; Zheng, Yinglan – Teaching Mathematics and Its Applications, 2020

The MATH taxonomy classifies questions according to the mathematical skills required to answer them. It was created to aid the development of more balanced assessments in undergraduate mathematics and has since been used to compare different assessment regimes across school and university. To date, there has been no systematic investigation of the…

Descriptors: Taxonomy, Mathematics Instruction, Teaching Methods, Reliability

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Improving Reliability in Assessing Integrative Learning Using Rubrics: Does Group Norming Help?

Peer reviewed
PDF on ERIC

Download full text

Lanah Stafford; Erin Cousins; Linda Bol; Megan Mize – Research & Practice in Assessment, 2023

Integrative learning is an important outcome for graduates of higher education. Therefore, it should be well-defined and assessed reliably. The American Association of Colleges & Universities has developed a rubric to define and assess integrative learning, but it has low reliability. This pilot study examines whether this rubric's reliability…

Descriptors: Scoring Rubrics, Reliability, Evaluation Methods, Faculty Development

The Effect of Visual Sort and Rate versus Visual Analog Scales on the Reliability of Judgments of Dysphonia

Peer reviewed

Direct link

Kapsner-Smith, Mara R.; Opuszynski, Amanda; Stepp, Cara E.; Eadie, Tanya L. – Journal of Speech, Language, and Hearing Research, 2021

Purpose: The reliability of auditory-perceptual judgments between listeners is a long-standing problem in the assessment of voice disorders. The purpose of this study was to determine whether a relatively novel experimental scaling method, called visual sort and rate (VSR), yielded stronger reliability than the more frequently used method of…

Descriptors: Voice Disorders, Interrater Reliability, Rating Scales, Severity (of Disability)

Agreement between Visual Inspection and Objective Analysis Methods: A Replication and Extension

Peer reviewed

Direct link

Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022

Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…

Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators

Analytic or Holistic: A Study of Agreement between Different Grading Models

Peer reviewed
PDF on ERIC

Download full text

Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018

Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…

Descriptors: Grading, Models, Reliability, Validity

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

Do You Mean What I Mean? Comparing Teacher Performance Self-Scores and Evaluator-Generated Scores

Peer reviewed

Direct link

Hunter, Seth B. – Journal of Education Human Resources, 2023

Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability

Estimating Hazard Ratios from Published Kaplan-Meier Survival Curves: A Methods Validation Study

Peer reviewed

Direct link

Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019

Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…

Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials

Continuous Improvement of Inter-Rater Reliability in Transition Compliance at a State Agency

Direct link

Heather Raithel – ProQuest LLC, 2023

A mixed methods action research study was designed to answer three research questions based on inter-rater reliability (IRR) in compliance calls for transition at a state education agency, perceived confidence levels in making and discussing compliance calls, and perceived confidence in sharing transition resources. An innovation based on…

Descriptors: Public Agencies, Interrater Reliability, Compliance (Legal), Comparative Analysis

Reliability of the Reflective Learning Framework for Assessing Higher-Order Thinking in Geography and Sustainability Courses

Peer reviewed

Direct link

Whalen, Kate; Paez, Antonio – Journal of Geography, 2022

Experiential education partnered with guided reflection is thought to support students with higher-order thinking skills. In this study, 44 reflections from two university-level sustainability courses were compared. In both courses students were asked to write a reflection, but only one course used the Reflective Learning Framework (RLF). Tests of…

Descriptors: Geography Instruction, Thinking Skills, Experiential Learning, Sustainability

Assessing Autism in Adults: An Evaluation of the Developmental, Dimensional and Diagnostic Interview-Adult Version (3Di-Adult)

Peer reviewed

Direct link

Mandy, William; Clarke, Kiri; McKenner, Michele; Strydom, Andre; Crabtree, Jason; Lai, Meng-Chuan; Allison, Carrie; Baron-Cohen, Simon; Skuse, David – Journal of Autism and Developmental Disorders, 2018

We developed a brief, informant-report interview for assessing autism spectrum conditions (ASC) in adults, called the Developmental, Dimensional and Diagnostic Interview-Adult Version (3Di-Adult); and completed a preliminary evaluation. Informant reports were collected for participants with ASC (n = 39), a non-clinical comparison group (n = 29)…

Descriptors: Autism, Pervasive Developmental Disorders, Adults, Diagnostic Tests

An Interview with ChatGPT on Emergency Remote Teaching: A Comparative Analysis Based on Human-AI Collaboration

Peer reviewed
PDF on ERIC

Download full text

Tülübas, Tijen; Demirkol, Murat; Ozdemir, Tuncay Yavuz; Polat, Hakan; Karakose, Turgut; Yirci, Ramazan – Educational Process: International Journal, 2023

Background/purpose: ChatGPT, a recent form of AI-based language model, have garnered interest among people from diverse backgrounds with its immersive capabilities. Using ChatGPT to support or generate scientific research has also created an ongoing debate over its advantages versus risks. The present study aimed to conduct an AI-enabled research…

Descriptors: Artificial Intelligence, Emergency Programs, Distance Education, COVID-19

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 22

Journal of Speech, Language,…	11
ProQuest LLC	10
Language Testing	6
Journal of Autism and…	5
Assessment & Evaluation in…	4
Educational and Psychological…	4
English Language Teaching	4
Language Assessment Quarterly	4
Advances in Health Sciences…	3
Behavior Modification	3
Creativity Research Journal	3
ETS Research Report Series	3
Educational Sciences: Theory…	3
Journal of Applied Behavior…	3
Online Submission	3
Research Synthesis Methods	3
Academic Medicine	2
American Journal of…	2
Applied Measurement in…	2
Assessing Writing	2
Autism: The International…	2
Clinical Linguistics &…	2
Developmental Psychology	2
Early Child Development and…	2
Education and Training in…	2
More ▼

Coniam, David	3
Lunz, Mary E.	3
Attali, Yigal	2
Beach, Kristen D.	2
Bocian, Kathleen M.	2
Bothe, Anne K.	2
Chavez, Oscar	2
Derby, K. Mark	2
Gillan, Nicola	2
Grouws, Douglas A.	2
Hestenes, Linda L.	2
Incikabi, Lutfi	2
Jones, Ian	2
Kokkinaki, Theano	2
McLaughlin, T. F.	2
Mims, Sharon U.	2
Myford, Carol M.	2
Nakamura, Yuji	2
O'Connor, Rollanda E.	2
O'Neill, Thomas R.	2
Papick, Ira	2
Wind, Stefanie A.	2
Zayac, Ryan M.	2
Abbott, Robert	1
More ▼

Test of English as a Foreign…	5
Autism Diagnostic Observation…	4
Woodcock Johnson Tests of…	4
Dynamic Indicators of Basic…	3
Early Childhood Environment…	2
National Assessment of…	2
Peabody Picture Vocabulary…	2
ACT Assessment	1
Adaptive Behavior Scale	1
Expressive One Word Picture…	1
Georgia Criterion Referenced…	1
Graduate Management Admission…	1
Kaufman Brief Intelligence…	1
MacArthur Bates Communicative…	1
Mean Length of Utterance	1
Multifactor Leadership…	1
NEO Personality Inventory	1
Neale Analysis of Reading…	1
Obsessive Compulsive Scale	1
Pediatric Evaluation of…	1
Praxis Series	1
Raven Progressive Matrices	1
SAT (College Admission Test)	1
Vineland Adaptive Behavior…	1
Wechsler Adult Intelligence…	1
More ▼