Publication Date
In 2025 | 3 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 36 |
Since 2016 (last 10 years) | 121 |
Since 2006 (last 20 years) | 343 |
Descriptor
Correlation | 401 |
Interrater Reliability | 401 |
Foreign Countries | 110 |
Scores | 87 |
Measures (Individuals) | 76 |
Statistical Analysis | 69 |
Validity | 66 |
Comparative Analysis | 65 |
Test Reliability | 62 |
Test Validity | 60 |
Evaluation Methods | 59 |
More ▼ |
Source
Author
Coniam, David | 4 |
Attali, Yigal | 3 |
Scahill, Lawrence | 3 |
Zhang, Mo | 3 |
Abrams, Lisa M. | 2 |
Aman, Michael G. | 2 |
Anna-Maria Fall | 2 |
Benton, Stephen L. | 2 |
Beula M. Magimairaj | 2 |
Bolton, Patrick | 2 |
Botting, Nicola | 2 |
More ▼ |
Publication Type
Education Level
Audience
Researchers | 10 |
Administrators | 1 |
Practitioners | 1 |
Teachers | 1 |
Location
Netherlands | 14 |
China | 11 |
California | 9 |
Canada | 9 |
Turkey | 8 |
United Kingdom | 8 |
Japan | 7 |
United States | 7 |
Florida | 6 |
Germany | 6 |
Hong Kong | 5 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
Suzanna Dooley; Tammy Hopper; Rachael Doyle; Orla Gilheaney; Margaret Walshe – International Journal of Language & Communication Disorders, 2025
Background: Individuals with dementia have communication limitations resulting from cognitive impairments that define the syndrome. Whereas there are numerous cognitive assessments for individuals with dementia, there are far fewer communication assessments. The Profiling Communication Ability in Dementia (P-CAD) was developed to address this gap.…
Descriptors: Communication Skills, Communication Problems, Dementia, Intellectual Disability
Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024
We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…
Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners
Kilic, Abdullah Faruk; Uysal, Ibrahim – International Journal of Assessment Tools in Education, 2022
Most researchers investigate the corrected item-total correlation of items when analyzing item discrimination in multi-dimensional structures under the Classical Test Theory, which might lead to underestimating item discrimination, thereby removing items from the test. Researchers might investigate the corrected item-total correlation with the…
Descriptors: Item Analysis, Correlation, Item Response Theory, Test Items
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Zahn, Daniela; Canton, Ursula; Boyd, Victoria; Hamilton, Laura; Mamo, Josianne; McKay, Jane; Proudfoot, Linda; Telfer, Dickson; Williams, Kim; Wilson, Colin – Studies in Higher Education, 2021
Evaluating the impact of Academic Literacies teaching (Lea and Street [1998. "Student Writing in Higher Education: An Academic Literacies Approach." "Studies in Higher Education" 23 (2): 157-72. doi:10.1080/03075079812331380364]) is difficult, as it involves gauging whether writers: (1) gain better understanding of what…
Descriptors: Writing Evaluation, Evaluation Methods, Undergraduate Students, Foreign Countries
Kelvin Terrell Pompey – ProQuest LLC, 2021
Many methods are used to measure interrater reliability for studies where each target receives ratings by a different set of judges. The purpose of this study is to explore the use of hierarchical modeling for estimating interrater reliability using the intraclass correlation coefficient. This study provides a description of how the ICC can be…
Descriptors: Interrater Reliability, Evaluation Methods, Test Reliability, Correlation
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items
Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020
The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…
Descriptors: Test Bias, Interrater Reliability, Responses, Correlation
Pin, Tamis W.; So, Vincent K. K.; Siu, Cynthia S. H.; Yip, Sheila S. N.; Cheung, Stella See-wing; Kan, Jenny Yim-mui – Journal of Autism and Developmental Disorders, 2021
To examine reliability and validity of the new Social Motor Function Classification System for Children with Autism Spectrum Disorders (SMFCS-ASD). The SMFCS-ASD reliability was examined on 25 children (62.4 months SD 7.8) with ASD among six physical therapists. The validity study involved 1001 children (57.0 months, SD 9.9) with ASD using the…
Descriptors: Autism, Pervasive Developmental Disorders, Children, Classification
Sasithorn Limgomolvilas; Patsawut Sukserm – LEARN Journal: Language Education and Acquisition Research Network, 2025
The assessment of English speaking in EFL environments can be inherently subjective and influenced by various factors beyond linguistic ability, including choice of assessment criteria, and even the rubric type. In classroom assessment, the type of rubric recommended for English speaking tasks is the analytical rubric. Driven by three aims, this…
Descriptors: Oral Language, Speech Communication, English (Second Language), Second Language Learning
Venkatraman, Yamini; Mahalingam, Shenbagavalli; Boominathan, Prakash – Journal of Speech, Language, and Hearing Research, 2022
Purpose: The Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) is a standardized instrument used in voice assessment to assess voice quality. It has been translated and culturally adapted in several languages. This study aimed at developing and validating a Tamil version of CAPE-V through auditory perceptual evaluation of remotely…
Descriptors: Sentences, Dravidian Languages, Acoustics, Auditory Perception
Pruchnic, Jeff; Barton, Ellen; Primeau, Sarah; Trimble, Thomas; Varty, Nicole; Foster, Tanina – Composition Forum, 2021
Over the past two decades, reflective writing has occupied an increasingly prominent position in composition theory, pedagogy, and assessment as researchers have described the value of reflection and reflective writing in college students' development of higher-order writing skills, such as genre conventions (Yancey, "Reflection";…
Descriptors: Reflection, Correlation, Essays, Freshman Composition
Li, Hongxia; Zhao, ChengLing; Long, Taotao; Huang, Yan; Shu, Fengfang – British Journal of Educational Technology, 2021
As an innovative evaluation tool, peer assessment is essential in Massive Open Online Courses (MOOCs). In both formative and summative peer assessments in MOOCs, providing reliable feedback is crucial in enhancing learning outcomes. Peer assessment has been highlighted as a reliable tool in both traditional classrooms and small-scale online…
Descriptors: Peer Evaluation, Online Courses, Open Education, Feedback (Response)
Dankiw, Kylie A.; Baldock, Katherine L.; Kumar, Saravana; Tsiros, Margarita D. – Australasian Journal of Early Childhood, 2021
Identifying and describing children's play behaviours is an important component of evaluating child development. The Behaviour Mapping Schedule is a direct observational tool which aims to describe and quantify children's play behaviours but is yet to undergo reliability testing. This study aimed to determine the intra- and inter-rater reliability…
Descriptors: Interrater Reliability, Classification, Child Behavior, Play
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring