ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	6
Since 2017 (last 10 years)	9
Since 2007 (last 20 years)	25

Descriptor

Evaluation Methods	33
Interrater Reliability	33
Reliability	33
Validity	10
Scores	7
Scoring	6
Scoring Rubrics	6
Student Evaluation	6
College Faculty	5
Evaluators	5
Data Analysis	4
Evaluation Criteria	4
Observation	4
Research Methodology	4
Teacher Evaluation	4
Academic Achievement	3
Case Studies	3
Comparative Analysis	3
Disabilities	3
Error of Measurement	3
Grading	3
Higher Education	3
Measures (Individuals)	3
Psychometrics	3
Student Attitudes	3
More ▼

Publication Type

Journal Articles	24
Reports - Research	17
Reports - Evaluative	8
Speeches/Meeting Papers	5
Dissertations/Theses -…	2
Information Analyses	2
Opinion Papers	2
Reports - Descriptive	2
Books	1
Non-Print Media	1

Education Level

Higher Education	8
Postsecondary Education	4
Elementary Education	3
Elementary Secondary Education	2
Junior High Schools	2
Middle Schools	2
Secondary Education	2
Grade 7	1
High Schools	1

Audience

Researchers	4
Practitioners	1

Location

Belgium	1
Canada	1
China	1
Connecticut	1
Netherlands	1
North Carolina	1
Pennsylvania	1
United Kingdom (England)	1
West Germany	1

Laws, Policies, & Programs

Assessments and Surveys

Childrens Depression Inventory

What Works Clearinghouse Rating

Showing 1 to 15 of 33 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Psychometric Properties of the Behavior Assessment System for Children Student Observation System (BASC-3 SOS) with Young Children in Special Education

Peer reviewed

Direct link

Schmidt, Ellyn M.; Rothenberg, W. Andrew; Davidson, Bridget C.; Barnett, Miya; Jent, Jason; Cadenas, Heleny; Fernandez, Corina; Davis, Eileen – Journal of Behavioral Education, 2023

Measuring classroom behavior among young children is important to guide assessment and intervention decisions, yet there is limited literature on appropriate direct observation tools for this purpose. This article describes the psychometric properties of the Behavior Assessment System for Children, Student Observation System (BASC-3 SOS) with 135…

Descriptors: Young Children, Special Education, Child Behavior, Psychometrics

Interdisciplinary Thinking among Seventh-Grade Students in Lower-Secondary Science Education

Peer reviewed
PDF on ERIC

Download full text

Shasha Chen; Shaohui Chi; Zuhao Wang – Journal of Baltic Science Education, 2025

Interdisciplinary thinking is critical for equipping students to apply scientific knowledge and tackle societal challenges across various disciplines, which has been recognized as a key objective of twenty-first century science education. However, research on effective interdisciplinary assessment in secondary school science education is still…

Descriptors: Thinking Skills, Interdisciplinary Approach, Science Instruction, Grade 7

Adaptation, Content Validity and Reliability of the Autism Classification System of Functioning for Social Communication: From Toddlerhood to Adolescent-Aged Children with Autism

Peer reviewed

Direct link

Di Rezze, Briano; Gentles, Stephen James; Hidecker, Mary Jo Cooley; Zwaigenbaum, Lonnie; Rosenbaum, Peter; Duku, Eric; Georgiades, Stelios; Roncadin, Caroline; Fang, Hanna; Tajik-Parvinchi, Diana; Viveiros, Helena – Journal of Autism and Developmental Disorders, 2022

The Autism Classification System of Functioning: Social Communication (ACSF) describes social communication functioning levels. First developed for preschoolers with ASD, this study tests an expanded age range (2-to-18 years). The ACFS rates the child's typical and best (i.e., capacity) performance. Qualitative methods tested parent and clinician…

Descriptors: Content Validity, Reliability, Autism Spectrum Disorders, Classification

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Cross-Validation and Application of a Scale Assessing School Band Performance

Peer reviewed

Direct link

Rossin, Emily G.; Bergee, Martin J. – Journal of Research in Music Education, 2021

This is the sixth and culminating study in a series whose purpose has been to acquire a conceptual understanding of school band performance and to develop an assessment based on this understanding. With the present study, we cross-validated and applied a rating scale for school band performance. In the cross-validation phase, college students…

Descriptors: Music Education, Music Activities, Music, Performance

Improving Reliability in Assessing Integrative Learning Using Rubrics: Does Group Norming Help?

Peer reviewed
PDF on ERIC

Download full text

Lanah Stafford; Erin Cousins; Linda Bol; Megan Mize – Research & Practice in Assessment, 2023

Integrative learning is an important outcome for graduates of higher education. Therefore, it should be well-defined and assessed reliably. The American Association of Colleges & Universities has developed a rubric to define and assess integrative learning, but it has low reliability. This pilot study examines whether this rubric's reliability…

Descriptors: Scoring Rubrics, Reliability, Evaluation Methods, Faculty Development

Building the Plane in Flight: Establishing Post Hoc Inter-Rater Reliability Coefficients in an Educational Context. Sage Research Methods Cases Part 2

Direct link

Albert M. Jimenez; Sally J. Zepeda – Sage Research Methods Cases, 2017

The work presented in this case study results from a study conducted in 2012-2014 examining a newly created teacher evaluation system to determine the inter-rater reliability of the classroom observation instrument. The teacher evaluation system was the result of a partnership between the school district and the university in the same city…

Descriptors: Case Studies, Interrater Reliability, Teacher Evaluation, Observation

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

A Statistical Estimate of the Validity and Reliability of a Rubric Developed by Connecticut's State Education Resource Center to Evaluate the Quality of Individualized Education Programs for Students with Disabilities

Direct link

Mearman, Kimberly A. – ProQuest LLC, 2013

Because of the critical function of the IEP in the planning and implementation of effective instruction for students with disabilities, educators need a reference to determine the standards of a quality IEP and a process by which to compare an IEP to those standards. A rubric can support educators in examining the quality of IEPs. This study used…

Descriptors: Construct Validity, Reliability, Scoring Rubrics, Individualized Education Programs

Professional Practice, Student Surveys, and Value-Added: Multiple Measures of Teacher Effectiveness in the Pittsburgh Public Schools. REL 2014-024

Peer reviewed
PDF on ERIC

Download full text

Chaplin, Duncan; Gill, Brian; Thompkins, Allison; Miller, Hannah – Regional Educational Laboratory Mid-Atlantic, 2014

Responding to federal and state prompting, school districts across the country are implementing new teacher evaluation systems that aim to increase the rigor of evaluation ratings, better differentiate effective teaching, and support personnel and staff development initiatives that promote teacher effectiveness and ultimately improve student…

Descriptors: Teacher Effectiveness, Public Schools, Teacher Evaluation, Student Surveys

Analysis of State-Level Evaluator Training Policy

Direct link

Benyon, Howard E., III. – ProQuest LLC, 2014

This policy analysis project focused on state-level education policy which lacks evaluator training as well as on requirements for research-based best practices. Due to federal mandates and funding as well as accountability to all stakeholders, states are adopting more rigorous evaluation systems. These high-stakes evaluation systems are putting…

Descriptors: Educational Policy, Policy Analysis, Evaluators, Professional Training

A Reliable and Valid Weighted Scoring Instrument for Use in Grading APA-Style Empirical Research Report

Peer reviewed

Direct link

Greenberg, Kathleen Puglisi – Teaching of Psychology, 2012

The scoring instrument described in this article is based on a deconstruction of the seven sections of an American Psychological Association (APA)-style empirical research report into a set of learning outcomes divided into content-, expression-, and format-related categories. A double-weighting scheme used to score the report yields a final grade…

Descriptors: Scoring, Research Reports, Grading, Outcome Measures

Toward a Quantitative Basis for Assessment and Diagnosis of Apraxia of Speech

Peer reviewed

Direct link

Haley, Katarina L.; Jacks, Adam; de Riesthal, Michael; Abou-Khalil, Rima; Roth, Heidi L. – Journal of Speech, Language, and Hearing Research, 2012

Purpose: We explored the reliability and validity of 2 quantitative approaches to document presence and severity of speech properties associated with apraxia of speech (AOS). Method: A motor speech evaluation was administered to 39 individuals with aphasia. Audio-recordings of the evaluation were presented to 3 experienced clinicians to determine…

Descriptors: Neurological Impairments, Speech Impairments, Speech Evaluation, Evaluation Methods

Intra-Rater and Inter-Rater Reliability of the Balance Error Scoring System in Pre-Adolescent School Children

Peer reviewed

Direct link

Sheehan, Dwayne P.; Lafave, Mark R.; Katz, Larry – Measurement in Physical Education and Exercise Science, 2011

This study was designed to test the intra- and inter-rater reliability of the University of North Carolina's Balance Error Scoring System in 9- and 10-year-old children. Additionally, a modified version of the Balance Error Scoring System was tested to determine if it was more sensitive in this population ("raw scores"). Forty-six…

Descriptors: Elementary School Students, Interrater Reliability, Scoring, Raw Scores

Previous Page | Next Page »

Pages: 1 | 2 | 3

Journal of Autism and…	2
Journal of Speech, Language,…	2
ProQuest LLC	2
Assessment & Evaluation in…	1
British Educational Research…	1
Developmental Medicine &…	1
Early Childhood Research…	1
Educational Assessment	1
Educational Research	1
Evaluation and the Health…	1
International Educational…	1
Journal of Baltic Science…	1
Journal of Behavioral…	1
Journal of College Science…	1
Journal of Early Intervention	1
Journal of Experimental…	1
Journal of Research in Music…	1
Journal of the American…	1
Journal of the Association…	1
Language Testing	1
Measurement in Physical…	1
Online Submission	1
Regional Educational…	1
Research & Practice in…	1
Research in Developmental…	1
More ▼

Zwaigenbaum, Lonnie	2
Abbott, Maree J.	1
Abou-Khalil, Rima	1
Albert M. Jimenez	1
Andrade, Heidi	1
Baker, Eva L.	1
Barnett, Miya	1
Benyon, Howard E., III.	1
Bergee, Martin J.	1
Brian, Jessica	1
Bryson, Susan E.	1
Cadenas, Heleny	1
Chaplin, Duncan	1
Cloud-Silva, Connie	1
Cusick, Anne	1
Davidson, Bridget C.	1
Davis, Eileen	1
De Cock, P.	1
Deklerck, J.	1
Denton, Jon J.	1
Desloovere, K.	1
Di Rezze, Briano	1
Duku, Eric	1
Dwyer, J.	1
More ▼