Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 33 |
Since 2006 (last 20 years) | 67 |
Descriptor
Correlation | 69 |
Interrater Reliability | 69 |
Statistical Analysis | 69 |
Foreign Countries | 26 |
Comparative Analysis | 19 |
Scores | 19 |
Measures (Individuals) | 16 |
Questionnaires | 16 |
Second Language Learning | 14 |
Validity | 12 |
Evaluation Methods | 11 |
More ▼ |
Source
Author
Coniam, David | 3 |
Buitelaar, Jan K. | 2 |
Rommelse, Nanda N. J. | 2 |
A. C., John | 1 |
Alsma, Jelmer | 1 |
Armstrong, Kirk J. | 1 |
Balint-Langel, Kinga | 1 |
Barron, Sheila | 1 |
Bavier, Richard | 1 |
Bednarz, Robert | 1 |
Ben-Simon, Anat | 1 |
More ▼ |
Publication Type
Journal Articles | 63 |
Reports - Research | 61 |
Tests/Questionnaires | 8 |
Reports - Evaluative | 5 |
Dissertations/Theses -… | 3 |
Information Analyses | 3 |
Speeches/Meeting Papers | 2 |
Education Level
Higher Education | 26 |
Postsecondary Education | 21 |
Secondary Education | 8 |
Elementary Education | 5 |
High Schools | 4 |
Early Childhood Education | 3 |
Middle Schools | 3 |
Elementary Secondary Education | 2 |
Grade 1 | 2 |
Grade 3 | 2 |
Grade 5 | 2 |
More ▼ |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items
Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020
The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…
Descriptors: Test Bias, Interrater Reliability, Responses, Correlation
Looney, Marilyn A. – Measurement in Physical Education and Exercise Science, 2018
The purpose of this article was two-fold (1) provide an overview of the commonly reported and under-reported absolute agreement indices in the kinesiology literature for continuous data; and (2) present examples of these indices for hypothetical data along with recommendations for future use. It is recommended that three types of information be…
Descriptors: Interrater Reliability, Evaluation Methods, Kinetics, Indexes
Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019
Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…
Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials
Takeda, Kazuya; Tanabe, Shigeo; Koyama, Soichiro; Nagai, Tomoko; Sakurai, Hiroaki; Kanada, Yoshikiyo; Shomoto, Koji – Measurement in Physical Education and Exercise Science, 2018
The aim of this study was to clarify the intra- and inter-rater reliability of the rate of force development in hip abductor muscle force measurements using a hand-held dynamometer. Thirty healthy adults were separately assessed by two independent raters on two separate days. Rate of force development was calculated from the slope of the…
Descriptors: Interrater Reliability, Human Body, Measurement Equipment, Handheld Devices
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Lambie, Glenn W.; Mullen, Patrick R.; Swank, Jacqueline M.; Blount, Ashley – Measurement and Evaluation in Counseling and Development, 2018
Supervisors evaluated counselors-in-training at multiple points during their practicum experience using the Counseling Competencies Scale (CCS; N = 1,070). The CCS evaluations were randomly split to conduct exploratory factor analysis and confirmatory factor analysis, resulting in a 2-factor model (61.5% of the variance explained).
Descriptors: Counselor Training, Counseling, Measures (Individuals), Competence
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018
This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…
Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students
van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018
In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…
Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills
Bruhn, Allison; Barron, Sheila; Fernando, Josephine; Balint-Langel, Kinga – Journal of Positive Behavior Interventions, 2018
Direct behavior ratings have been identified as a practical and feasible alternative to direct observation of behavior for monitoring behavioral progress. Despite the evidence of usability, there have been calls for further examination of direct behavior ratings using different behaviors and scales. To this end, we examined the ratings of…
Descriptors: Positive Behavior Supports, Behavior Rating Scales, Observation, Elementary School Students
Haegele, Justin; Zhu, Xihe; Davis, Summer – International Journal of Inclusive Education, 2018
The purpose of this study was to explore the barriers and facilitators to participation in physical education (PE) for students with disabilities (SWD) from the perspectives of in-service physical educators. A convenience sample of 168 physical educators (72% female, 94% Caucasian) from the United States completed a short questionnaire. After data…
Descriptors: Inclusion, Barriers, Disabilities, Physical Education
van Batenburg, Eline S. L.; Oostdam, Ron J.; van Gelderen, Amos J. S.; de Jong, Nivja H. – Language Testing, 2018
This article explores ways to assess interactional performance, and reports on the use of a test format that standardizes the interlocutor's linguistic and interactional contributions to the exchange. It describes the construction and administration of six scripted speech tasks (instruction, advice, and sales tasks) with pre-vocational learners (n…
Descriptors: Second Language Learning, Speech Tests, Interaction, Test Reliability
Johnston, Lucy; Schluter, Philip J. – Studies in Higher Education, 2017
With increasing competition for postgraduate research scholarships, awarding processes demand attention and scrutiny. We examine inter-rater reliability for two prestigious New Zealand scholarships, the Shirtcliffe Fellowship and the Gordon Watson Scholarship. For each scholarship, five assessors (three academic; two non-academic) independently…
Descriptors: Interrater Reliability, Scholarships, Academic Achievement, Program Proposals
Thawabieh, Ahmad M. – Journal of Curriculum and Teaching, 2017
This study aimed to compare between the students' self-assessment and teachers' assessment. The study sample consisted of 71 students at Tafila Technical University studying Introduction to Psychology course. The researcher used 2 students' self-assessment tools and 2 tests. The results indicated that students can assess themselves accurately if…
Descriptors: Comparative Analysis, Self Evaluation (Individuals), Student Evaluation, Psychology
Tanner, Nicholas; Eklund, Katie; Kilgus, Stephen P.; Johnson, Austin H. – School Psychology Review, 2018
Data derived from universal screening procedures are increasingly utilized by schools to identify and provide additional support to students at risk for behavioral and emotional concerns. As screening has the potential to be resource intensive, effort has been placed on the development of efficient screening procedures, including brief behavior…
Descriptors: Screening Tests, At Risk Students, Behavior Problems, Emotional Problems