Publication Date
In 2025 | 27 |
Since 2024 | 95 |
Since 2021 (last 5 years) | 356 |
Since 2016 (last 10 years) | 878 |
Since 2006 (last 20 years) | 2091 |
Descriptor
Interrater Reliability | 3093 |
Foreign Countries | 642 |
Evaluation Methods | 501 |
Test Reliability | 498 |
Test Validity | 406 |
Correlation | 401 |
Scoring | 336 |
Comparative Analysis | 327 |
Scores | 321 |
Validity | 309 |
Student Evaluation | 301 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 130 |
Practitioners | 42 |
Teachers | 22 |
Administrators | 11 |
Counselors | 3 |
Policymakers | 2 |
Location
Australia | 56 |
Turkey | 52 |
United Kingdom | 46 |
Canada | 45 |
Netherlands | 40 |
California | 37 |
China | 37 |
United States | 30 |
United Kingdom (England) | 24 |
Taiwan | 23 |
Japan | 22 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 3 |
Meets WWC Standards with or without Reservations | 3 |
Does not meet standards | 3 |
Lewis, Carly A.; Myers, Carl L. – Contemporary School Psychology, 2021
Behavior rating scales are frequently used to assess social-emotional behaviors of children. While broadband behavior rating scales often measure similarly named constructs, it is unclear how consistently different instruments measure those constructs. Head Start teachers completed the preschool versions of the Behavior Assessment System for…
Descriptors: Preschool Teachers, Interrater Reliability, Child Behavior, Behavior Rating Scales
Huscroft-D'Angelo, Jacqueline; Wery, Jessica; Martin, Jodie Diane; Pierce, Corey; Crawford, Lindy – Behavioral Disorders, 2021
"The Scales for Assessing Emotional Disturbance--Third Edition Rating Scale" (SAED-3 RS; Epstein et al.) is a standardized, norm-referenced measure designed to aid in the identification process by providing useful data to professionals determining eligibility of students with an emotional disturbance (ED). Three studies are reported to…
Descriptors: Measures (Individuals), Emotional Disturbances, Test Reliability, Interrater Reliability
Belur, Jyoti; Tompson, Lisa; Thornton, Amy; Simon, Miranda – Sociological Methods & Research, 2021
A methodologically sound systematic review is characterized by transparency, replicability, and a clear inclusion criterion. However, little attention has been paid to reporting the details of interrater reliability (IRR) when multiple coders are used to make decisions at various points in the screening and data extraction stages of a study. Prior…
Descriptors: Interrater Reliability, Decision Making, Accuracy, Coding
Todaro, Francesca; Pizzorni, Nicole; Scarponi, Letizia; Ronzoni, Clara; Huckabee, Maggie-Lee; Schindler, Antonio – International Journal of Language & Communication Disorders, 2021
Background: The Test of Masticating and Swallowing Solids (TOMASS) is an international standardized swallowing assessment tool. However, its psychometric characteristics have not been analysed in patients with dysphagia. Aims: To analyse TOMASS's (1) inter- and intra-rater reliability in a clinical population of patients with dysphagia, (2)…
Descriptors: Physical Disabilities, Test Reliability, Test Validity, Standardized Tests
Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021
For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…
Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests
Solano-Flores, Guillermo – Educational Measurement: Issues and Practice, 2021
This article proposes a Boolean approach to representing and analyzing interobserver agreement in dichotomous coding. Building on the notion that observations are samples of a universe of observations, it submits that coding can be viewed as a process in which observers sample pieces of evidence on constructs. It distinguishes between formal and…
Descriptors: Online Searching, Coding, Interrater Reliability, Evidence
Kapsner-Smith, Mara R.; Opuszynski, Amanda; Stepp, Cara E.; Eadie, Tanya L. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: The reliability of auditory-perceptual judgments between listeners is a long-standing problem in the assessment of voice disorders. The purpose of this study was to determine whether a relatively novel experimental scaling method, called visual sort and rate (VSR), yielded stronger reliability than the more frequently used method of…
Descriptors: Voice Disorders, Interrater Reliability, Rating Scales, Severity (of Disability)
Palmer, Melanie; Tarver, Joanne; Carter Leno, Virginia; Paris Perez, Juan; Frayne, Margot; Slonims, Vicky; Pickles, Andrew; Scott, Stephen; Charman, Tony; Simonoff, Emily – Journal of Autism and Developmental Disorders, 2023
Emotional and behavioral problems (EBPs) frequently occur in young autistic children. Discrepancies between parents and other informants are common but can lead to uncertainty in formulation, diagnosis and care planning. This study aimed to explore child and informant characteristics are associated with reported child EBPs across settings.…
Descriptors: Observation, Emotional Disturbances, Behavior Problems, Autism Spectrum Disorders
Constructing a Roadmap to Measure the Quality of Business Assessments Aimed at Curriculum Management
Silva, Thanuci; Santos, Regiane dos; Mallet, Débora – Journal of Education for Business, 2023
Assuring the quality of education is a concern of learning institutions. To do so, it is necessary to have assertive learning management, with consistent data on students' outcomes. This research provides associate deans and researchers, a roadmap with which to gather evidence to improve the quality of open-ended assessments. Based on statistical…
Descriptors: Student Evaluation, Evaluation Methods, Business Education, Higher Education
Anna Kay Steadman – ProQuest LLC, 2023
The Performance Assessment and Evaluation System (PAES) is used by all major universities in the state of Utah to measure the effective teaching skills of preservice candidates as they progress through their teaching preparation program. The resulting ratings are used to make high-stakes decisions relating to course completion as well as…
Descriptors: Preservice Teachers, Student Evaluation, Teaching Skills, Elementary School Teachers
Ellie Renae Bowen – ProQuest LLC, 2023
The educative Teacher Performance Assessment (edTPA) has been adopted by many state legislatures and teacher preparation programs (TPP). These states require teacher candidates to pass the edTPA with a state-specific passing score to be recommended for licensure. In the 19 states where passing the edTPA has not been required as a condition of…
Descriptors: Interrater Reliability, Teacher Evaluation, Rating Scales, Performance Based Assessment
Nicole B. Wiggs; Linda A. Reddy; Ryan Kettler; Anh Hua; Christopher Dudek; Adam Lekwa; Briana Bronstein – Assessment for Effective Intervention, 2023
The Classroom Strategies Assessment System (CSAS) is a multi-rater, multi-method (direct observation and rating scale methodology) assessment of teachers' use of research-based instructional and behavior management strategies. The present study investigated the association between teacher self-report and school administrator ratings using the CSAS…
Descriptors: Measurement Techniques, Classroom Observation Techniques, Teacher Evaluation, Teaching Methods
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Alison Cook-Sather; Ruth L. Healey – Teaching & Learning Inquiry, 2024
Peer review is widely accepted as critical to legitimating scholarly publication, and yet, it runs the risk of reproducing inequities in publishing processes and products. Acknowledging at once the historical need to legitimize SoTL publications, the current danger of reproducing exclusive practices, and the aspirational goal to "practice…
Descriptors: Peer Evaluation, Academic Language, Writing (Composition), Interrater Reliability
Julia Brochey-Taylor; Joseph A. Taylor – Educational Research and Reviews, 2024
The purpose of this synthesis study was to assess the reliability and validity of the Draw-A-Scientist Test (DAST) and its variations across multiple studies, aiming to understand limitations and propose modifications for future application within and beyond the science domain. Given the existence of multiple DAST versions, this study quantified…
Descriptors: Cognitive Tests, Freehand Drawing, Personality Measures, Projective Measures