Publication Date
In 2025 | 27 |
Since 2024 | 95 |
Since 2021 (last 5 years) | 356 |
Since 2016 (last 10 years) | 878 |
Since 2006 (last 20 years) | 2091 |
Descriptor
Interrater Reliability | 3093 |
Foreign Countries | 642 |
Evaluation Methods | 501 |
Test Reliability | 498 |
Test Validity | 406 |
Correlation | 401 |
Scoring | 336 |
Comparative Analysis | 327 |
Scores | 321 |
Validity | 309 |
Student Evaluation | 301 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 130 |
Practitioners | 42 |
Teachers | 22 |
Administrators | 11 |
Counselors | 3 |
Policymakers | 2 |
Location
Australia | 56 |
Turkey | 52 |
United Kingdom | 46 |
Canada | 45 |
Netherlands | 40 |
California | 37 |
China | 37 |
United States | 30 |
United Kingdom (England) | 24 |
Taiwan | 23 |
Japan | 22 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 3 |
Meets WWC Standards with or without Reservations | 3 |
Does not meet standards | 3 |
Heather Hirst; Jennifer Campbell; Samantha Chamberlin; Ibukun Olagunju; Frank Bird; James K. Luiselli – Journal of Intellectual Disabilities, 2024
Frailty is a health concern for many adults with intellectual disability and should be measured to detect at-risk conditions, monitor disease, plan treatment, and gauge mortality. This descriptive pilot study evaluated measurement consistency (inter-rater agreement) of the Intellectual Disability-Frailty Index Short Form among multiple assessors…
Descriptors: Adults, Intellectual Disability, Physical Health, Aging (Individuals)
Angus Kittelman; Sara Izzard; Kent McIntosh; Kelsey R. Morris; Timothy J. Lewis – Assessment for Effective Intervention, 2024
The purpose of this study was to evaluate the psychometric properties of the Self-Assessment Survey (SAS) 4.0, an updated measure assessing implementation fidelity of positive behavioral interventions and supports (PBIS). A total of 627 school personnel from 33 schools in six U.S. states completed the SAS 4.0 during the 2021-2022 school year. We…
Descriptors: Positive Behavior Supports, Teachers, Self Evaluation (Individuals), Test Reliability
Abbas, Mohsin; van Rosmalen, Peter; Kalz, Marco – IEEE Transactions on Learning Technologies, 2023
For predicting and improving the quality of essays, text analytic metrics (surface, syntactic, morphological, and semantic features) can be used to provide formative feedback to the students in higher education. In this study, the goal was to identify a sufficient number of features that exhibit a fair proxy of the scores given by the human raters…
Descriptors: Feedback (Response), Automation, Essays, Scoring
Mark White; Matt Ronfeldt – Educational Assessment, 2024
Standardized observation systems seek to reliably measure a specific conceptualization of teaching quality, managing rater error through mechanisms such as certification, calibration, validation, and double-scoring. These mechanisms both support high quality scoring and generate the empirical evidence used to support the scoring inference (i.e.,…
Descriptors: Interrater Reliability, Quality Control, Teacher Effectiveness, Error Patterns
Elayne P. Colón; Lori M. Dassa; Thomas M. Dana; Nathan P. Hanson – Action in Teacher Education, 2024
To meet accreditation expectations, teacher preparation programs must demonstrate their candidates are evaluated using summative assessment tools that yield sound, reliable, and valid data. These tools are primarily used by the clinical experience team -- university supervisors and mentor teachers. Institutional beliefs regarding best practices…
Descriptors: Student Teachers, Teacher Interns, Evaluation Methods, Interrater Reliability
Bonett, Douglas G. – Journal of Educational and Behavioral Statistics, 2022
The limitations of Cohen's ? are reviewed and an alternative G-index is recommended for assessing nominal-scale agreement. Maximum likelihood estimates, standard errors, and confidence intervals for a two-rater G-index are derived for one-group and two-group designs. A new G-index of agreement for multirater designs is proposed. Statistical…
Descriptors: Statistical Inference, Statistical Data, Interrater Reliability, Design
Weingarden, Merav; Heyd-Metzuyanim, Einat – Journal of Mathematics Teacher Education, 2023
In this study, we examine "what went wrong" in our professional development program for encouraging cognitively demanding instruction, focusing on the difficulties we encountered in using an observational tool for evaluating this type of instruction and reaching inter-rater reliability. We do so through the lens of a discursive theory of…
Descriptors: Mathematics Instruction, Interrater Reliability, Cognitive Processes, Difficulty Level
Bryce D. McLeod; Nicole Porter; Aaron Hogue; Emily M. Becker-Haimes; Amanda Jensen-Doss – Grantee Submission, 2023
Objective: The precise measurement of treatment fidelity (quantity and quality in the delivery of treatment strategies in an intervention) is essential for intervention development, evaluation, and implementation. Various informants are used in fidelity assessment (e.g., observers, practitioners [clinicians, teachers], clients), but these…
Descriptors: Measurement, Fidelity, Educational Research, Evidence Based Practice
Barbara Jane Cunningham; Peter Rosenbaum; Anastasia Nepotiuk; Nancy Thomas-Stonell – Communication Disorders Quarterly, 2024
This brief report presents interrater reliability data for the Focus on the Outcomes of Communication Under Six (FOCUS-34) between parents, and between parents and speech-language pathologists (SLPs). Reliability for all three raters combined was good to excellent across three assessments. Reliability for pairs of raters was variable but generally…
Descriptors: Interrater Reliability, Outcome Measures, Preschool Children, Parents
Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024
This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…
Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory
Enninga, Annemieke; Waninge, Aly; Post, Wendy J.; van der Putten, Annette A. J. – Journal of Applied Research in Intellectual Disabilities, 2023
Background: Persons with profound intellectual and multiple disabilities (PIMD) are vulnerable when it comes to experiencing pain. Reliable assessment of pain-related behaviour in these persons is difficult. "Aim" To determine how pain items can be reliably scored in adults with PIMD. Methods: We developed an instruction protocol for the…
Descriptors: Test Reliability, Pain, Behavior, Adults
Gilstrap, Donald L.; Whitver, Sara Maurice; Scalfani, Vincent F.; Bray, Nathaniel J. – Innovative Higher Education, 2023
This article explores how well bibliometrics and altmetrics reflect research impact in relation to Boyer's Model of the Scholarship. Indices used for both types of metrics are explored and discussed while including an analysis on primary methodological works performed on each in the literature to date. As confirmatory in nature, we chose as our…
Descriptors: Bibliometrics, Models, Scholarship, Research
Ichikowitz, Kerri; Bruce, Carolyn; Meitanis, Vanessa; Cheung, Kelly; Kim, Yekyung; Talbourdet, Esther; Newton, Caroline – International Journal of Language & Communication Disorders, 2023
Background: People with aphasia (PWA) can experience functional numeracy difficulties, that is, problems understanding or using numbers in everyday life, which can have numerous negative impacts on their daily lives. There is growing interest in designing functional numeracy interventions for PWA; however, there are limited suitable assessments…
Descriptors: Test Construction, Test Validity, Numeracy, Adults
Lamprianou, Iasonas – Sociological Methods & Research, 2023
This study investigates inter- and intracoder reliability, proposing a new approach based on social network analysis (SNA) and exponential random graph models (ERGM). During a recent exit poll, the responses of voters to two open-ended questions were recorded. A coding experiment was conducted where a group of coders coded a sample of text…
Descriptors: Interrater Reliability, Coding, Social Networks, Network Analysis
Evans, Tanya; Mejía-Ramos, Juan Pablo; Inglis, Matthew – Educational Studies in Mathematics, 2022
Offering explanations is a central part of teaching mathematics, and understanding those explanations is a vital activity for learners. Given this, it is natural to ask what makes a good mathematical explanation. This question has received surprisingly little attention in the mathematics education literature, perhaps because the field has no…
Descriptors: Mathematics, Professional Personnel, Undergraduate Students, Mathematics Activities