Publication Date
| In 2026 | 0 |
| Since 2025 | 60 |
| Since 2022 (last 5 years) | 286 |
| Since 2017 (last 10 years) | 782 |
| Since 2007 (last 20 years) | 2044 |
Descriptor
| Interrater Reliability | 3126 |
| Foreign Countries | 655 |
| Test Reliability | 504 |
| Evaluation Methods | 503 |
| Test Validity | 411 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Konstantin Vinokic; Lukas Begrich; Mareike Kunter; Susanne Kuger – Frontline Learning Research, 2024
Thin slices ratings (i.e., ratings based on first impressions) have yielded intriguingly accurate results in various domains. Among other, researcher have applied the thin slices technique to assess instructional quality, showing that teacher-student interactions can be reliably inferred by just very short snippets of classroom instruction. The…
Descriptors: Teacher Effectiveness, Teacher Student Relationship, Foreign Countries, Classroom Observation Techniques
Primary School Students' Ratings of Teaching -- Do They Differentiate between Subjects and Teachers?
Svenja Rieser; Alexander Naumann – School Effectiveness and School Improvement, 2024
Our study aims to provide empirical evidence for and against the valid use of primary school students' ratings of three generic dimensions of teaching quality (classroom management, supportive climate, cognitive activation). We examine whether students discriminate between corresponding dimensions in different subjects, taking into account whether…
Descriptors: Foreign Countries, Elementary School Students, Elementary School Teachers, Student Evaluation of Teacher Performance
Walker, Grant M.; Basilakos, Alexandra; Fridriksson, Julius; Hickok, Gregory – Journal of Speech, Language, and Hearing Research, 2022
Purpose: Meaningful changes in picture naming responses may be obscured when measuring accuracy instead of quality. A statistic that incorporates information about the severity and nature of impairments may be more sensitive to the effects of treatment. Method: We analyzed data from repeated administrations of a naming test to 72 participants with…
Descriptors: Naming, Change, Aphasia, Severity (of Disability)
Tschida, Jessica E.; Yerys, Benjamin E. – Autism: The International Journal of Research and Practice, 2022
Executive function challenges are commonly reported in the home setting for children with an autism spectrum disorder diagnosis (hereafter, autism), but little is known about these challenges in the school setting. A total of 337 youth (autism, N = 241 and typically developing, N = 96) were assessed using Behavior Rating Inventory of Executive…
Descriptors: Executive Function, Students with Disabilities, Age Differences, Behavior Problems
Hollands, Fiona M.; Pan, Yilin; Kieffer, Michael J.; Holmes, Venita R.; Wang, Yixin; Escueta, Maya; Head, Laura; Muroga, Atsuko – Evidence & Policy: A Journal of Research, Debate and Practice, 2022
Background: Education decision makers are increasingly expected to use evidence to inform their actions. However, the majority of educational interventions have not yet been studied and it is challenging to produce high quality research evidence quickly enough to influence policy questions. Aims and objectives: We set out to gather evidence on the…
Descriptors: Elementary Schools, Urban Schools, Reading Instruction, Instructional Effectiveness
Gao, Ruiqin; Raygoza, Alyssa; Distefano, Christine; Greer, Fred; Dowdy, Erin – School Psychology International, 2022
The Pediatric Symptom Checklist-17 (PSC-17) is a popular screening instrument used by parents and clinicians to assess children's behavioral functioning. However, more schools are examining the potential of the PSC-17 as part of a Multi-Tier System of Support framework. To investigate the potential of the PSC-17 in the schools, a sample of 1,779…
Descriptors: Check Lists, Measures (Individuals), Screening Tests, Child Behavior
Matthews, Joshua – RELC Journal: A Journal of Language Teaching and Research, 2023
This article explores how the analysis of inter-rater discourse can be used to support collective reflective practice in second language (L2) assessment. To demonstrate, a focused case of the discourse between two experienced language teachers as they negotiate assessment decisions on L2 written texts is presented. Of particular interest was the…
Descriptors: Interrater Reliability, Discourse Analysis, Student Evaluation, Second Language Learning
Heather Raithel – ProQuest LLC, 2023
A mixed methods action research study was designed to answer three research questions based on inter-rater reliability (IRR) in compliance calls for transition at a state education agency, perceived confidence levels in making and discussing compliance calls, and perceived confidence in sharing transition resources. An innovation based on…
Descriptors: Public Agencies, Interrater Reliability, Compliance (Legal), Comparative Analysis
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Lane, Suzanne – Journal of Educational Measurement, 2019
Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…
Descriptors: Responses, Accuracy, Validity, Interrater Reliability
Sasithorn Limgomolvilas; Patsawut Sukserm – LEARN Journal: Language Education and Acquisition Research Network, 2025
The assessment of English speaking in EFL environments can be inherently subjective and influenced by various factors beyond linguistic ability, including choice of assessment criteria, and even the rubric type. In classroom assessment, the type of rubric recommended for English speaking tasks is the analytical rubric. Driven by three aims, this…
Descriptors: Oral Language, Speech Communication, English (Second Language), Second Language Learning
Georgios Zacharis; Stamatios Papadakis – Educational Process: International Journal, 2025
Background/purpose: Generative artificial intelligence (GenAI) is often promoted as a transformative tool for assessment, yet evidence of its validity compared to human raters remains limited. This study examined whether an AI-based rater could be used interchangeably with trained faculty in scoring complex coursework. Materials/methods:…
Descriptors: Artificial Intelligence, Technology Uses in Education, Computer Assisted Testing, Grading
Babcock, Ben; Risk, Nicole M.; Wyse, Adam E. – Educational Measurement: Issues and Practice, 2020
This study compared the statistical properties of four job analysis task survey response scale types: criticality, difficulty in learning, importance, and frequency. We used nine job analysis studies spanning two fields, medical imaging and allied health professionals, to compare the job analysis scales in terms of variability and interrater…
Descriptors: Job Analysis, Radiology, Allied Health Personnel, Surveys
Joseph, Gail; Soderberg, Janet S.; Stull, Sara; Cummings, Kevin; McCutchen, Deborah; Han, Rachel J. – Early Education and Development, 2020
Research Findings: This study explores the inter-rater reliability of WaKIDS, Washington State's kindergarten entry assessment (KEA). Specifically, we analyze (1) the extent to which teachers' assessments are in agreement with a master code, (2) how often inaccurate assessment decisions lead to misidentification of school readiness, and (3)…
Descriptors: Interrater Reliability, School Readiness, Kindergarten, Evaluation Problems
Goldhaber, Dan; Grout, Cyrus; Wolf, Malcom; Martinkova, Patricia – National Center for Analysis of Longitudinal Data in Education Research (CALDER), 2020
There is growing interest in using measures of teacher applicant quality to improve hiring decisions, but the statistical properties of such measures are poorly understood. We present evidence on structured ratings solicited from teacher applicants' references. We find that the reference ratings capture only one underlying dimension of applicant…
Descriptors: Job Applicants, Teacher Selection, Interrater Reliability, Decision Making

Peer reviewed
Direct link
