Publication Date
In 2025 | 27 |
Since 2024 | 95 |
Since 2021 (last 5 years) | 356 |
Since 2016 (last 10 years) | 878 |
Since 2006 (last 20 years) | 2091 |
Descriptor
Interrater Reliability | 3093 |
Foreign Countries | 642 |
Evaluation Methods | 501 |
Test Reliability | 498 |
Test Validity | 406 |
Correlation | 401 |
Scoring | 336 |
Comparative Analysis | 327 |
Scores | 321 |
Validity | 309 |
Student Evaluation | 301 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 130 |
Practitioners | 42 |
Teachers | 22 |
Administrators | 11 |
Counselors | 3 |
Policymakers | 2 |
Location
Australia | 56 |
Turkey | 52 |
United Kingdom | 46 |
Canada | 45 |
Netherlands | 40 |
California | 37 |
China | 37 |
United States | 30 |
United Kingdom (England) | 24 |
Taiwan | 23 |
Japan | 22 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 3 |
Meets WWC Standards with or without Reservations | 3 |
Does not meet standards | 3 |
Alexandra M. Pierce; Lisa M. H. Sanetti; Melissa A. Collier-Meek; Austin H. Johnson – Grantee Submission, 2024
Visual analysis is the primary methodology used to determine treatment effects from graphed single-case design data. Previous studies have demonstrated mixed findings related to interrater agreement between both expert and novice visual analysts, which represents a critical limitation of visual analysis and supports calls for also presenting…
Descriptors: Graphs, Interrater Reliability, Statistical Analysis, Expertise
Pearson, Terry – FORUM: for promoting 3-19 comprehensive education, 2023
Ofsted has frequently defended the judgements made during inspections by claiming that inspection ratings are reliable, as shown by the results from the collection of studies the inspectorate has conducted. I outline the inspectorate's view of reliability and problematise the studies that it has carried out, noting that these provide insufficient…
Descriptors: Inspection, Interrater Reliability, Decision Making, Value Judgment
Kristen Bottema-Beutel; Shannon Crowley LaPoint; So Yoon Kim; Sarah Mohiuddin; Qun Yu; Rachael McKinnon – Exceptional Children, 2024
In this secondary analysis of a previously conducted systematic review, we analyze social validity assessments in intervention research for transition-age autistic youth. Social validity is concerned with the acceptability of the intervention goals, the acceptability and feasibility of the intervention procedures, and the perceived importance of…
Descriptors: Autism Spectrum Disorders, Intervention, Validity, Psychometrics
Susan K. Johnsen – Gifted Child Today, 2025
The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…
Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement
Lottridge, Susan; Woolf, Sherri; Young, Mackenzie; Jafari, Amir; Ormerod, Chris – Journal of Computer Assisted Learning, 2023
Background: Deep learning methods, where models do not use explicit features and instead rely on implicit features estimated during model training, suffer from an explainability problem. In text classification, saliency maps that reflect the importance of words in prediction are one approach toward explainability. However, little is known about…
Descriptors: Documentation, Learning Strategies, Models, Prediction
Stefan K. Schauber; Anne O. Olsen; Erik L. Werner; Morten Magelssen – Advances in Health Sciences Education, 2024
Introduction: Research in various areas indicates that expert judgment can be highly inconsistent. However, expert judgment is indispensable in many contexts. In medical education, experts often function as examiners in rater-based assessments. Here, disagreement between examiners can have far-reaching consequences. The literature suggests that…
Descriptors: Medical Students, Performance Based Assessment, Expertise, Interrater Reliability
McCluskey, Sydne – ProQuest LLC, 2023
Rater comparison analysis is commonly necessary in the social sciences. Conventional approaches to the problem generally focus on calculation of agreement statistics, which provide useful but incomplete information about rater agreement. Importantly, one-number agreement statistics give no indication regarding the nature of disagreements, nor do…
Descriptors: Bayesian Statistics, Structural Equation Models, Interrater Reliability, Beliefs
Feldberg, Zachary R. – ProQuest LLC, 2023
Cognitive diagnostic models (CDMs) provide pedagogically relevant information in the form of a student profile of multiple binary categorizations of students into mastery or nonmastery statuses on latent traits called attributes. Federal educational accountability requires accountability measures to designate students into one of at least three…
Descriptors: Accountability, Standards, Cutting Scores, Models
Tavares, Walter; Kinnear, Benjamin; Schumacher, Daniel J.; Forte, Milena – Advances in Health Sciences Education, 2023
In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused…
Descriptors: Medical Education, Interrater Reliability, Evaluation Methods, Training
Samuel D'Emanuele; Francesca Nardello; Fabrizio Garau; Diego Campaci; Federico Schena; Cantor Tarperi – Measurement in Physical Education and Exercise Science, 2025
The agreement between a wearable inertial sensor (GYKO, G) and the force platform (P) was assessed by evaluating "test-retest" and "inter-rater reliability." Thirty-eight subjects were enrolled; the selected indices of balance were investigated over foot positions and (un)stable conditions. Intraclass correlation coefficient…
Descriptors: Human Posture, Measurement Equipment, Interrater Reliability, Measurement Techniques
Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025
Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…
Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods
Victoria Reyes; Elizabeth Bogumil; Levin Elias Welch – Sociological Methods & Research, 2024
Transparency is once again a central issue of debate across types of qualitative research. Work on how to conduct qualitative data analysis, on the other hand, walks us through the step-by-step process on how to code and understand the data we've collected. Although there are a few exceptions, less focus is on transparency regarding…
Descriptors: Qualitative Research, Data Analysis, Guides, Databases
Stefanie A. Wind; Yuan Ge – Measurement: Interdisciplinary Research and Perspectives, 2024
Mixed-format assessments made up of multiple-choice (MC) items and constructed response (CR) items that are scored using rater judgments include unique psychometric considerations. When these item types are combined to estimate examinee achievement, information about the psychometric quality of each component can depend on that of the other. For…
Descriptors: Interrater Reliability, Test Bias, Multiple Choice Tests, Responses
Monica L. Coleman; Moira Ragan; Tahani Dari – Measurement and Evaluation in Counseling and Development, 2024
Intercoder reliability can increase trustworthiness, accuracy, rigor, collaboration, and power sharing in qualitative research. Though not every qualitative design can utilize intercoder reliability, this article highlights how positivist qualitative research, community-based participatory research, and participatory evaluation all strengthen when…
Descriptors: Interrater Reliability, Qualitative Research, Counseling, Research
Elizabeth J. Preas; Mary E. Halbur; Regina A. Carroll – Analysis of Verbal Behavior, 2024
Procedural fidelity refers to the degree to which procedures for an assessment or intervention (i.e., independent variables) are implemented consistent with the prescribed protocols. Procedural fidelity is an important factor in demonstrating the internal validity of an experiment and clinical treatments. Previous reviews evaluating the inclusion…
Descriptors: Verbal Communication, Behavioral Science Research, Periodicals, Fidelity