Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 9 |
Descriptor
Error of Measurement | 12 |
Interrater Reliability | 12 |
Reliability | 12 |
Correlation | 5 |
Scores | 4 |
Accuracy | 3 |
Evaluation Methods | 3 |
Generalizability Theory | 3 |
Performance Based Assessment | 3 |
Scoring | 3 |
Statistical Analysis | 3 |
More ▼ |
Source
Author
Almond, Patricia | 1 |
Anderson, Michele A. | 1 |
Applegate, E. Brooks | 1 |
Carol Eckerly | 1 |
Chan, Kelvin K. W. | 1 |
Cheng, Sierra | 1 |
Davidow, Jason H. | 1 |
De Cock, P. | 1 |
Deklerck, J. | 1 |
Desloovere, K. | 1 |
Edge, Robin L. | 1 |
More ▼ |
Publication Type
Journal Articles | 10 |
Reports - Research | 10 |
Reports - Evaluative | 2 |
Speeches/Meeting Papers | 2 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 3 |
Postsecondary Education | 3 |
Audience
Researchers | 1 |
Location
China (Beijing) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Davidow, Jason H.; Ye, Jun; Edge, Robin L. – International Journal of Language & Communication Disorders, 2023
Background: Speech-language pathologists often multitask in order to be efficient with their commonly large caseloads. In stuttering assessment, multitasking often involves collecting multiple measures simultaneously. Aims: The present study sought to determine reliability when collecting multiple measures simultaneously versus individually.…
Descriptors: Graduate Students, Measurement, Reliability, Group Activities
Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019
Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…
Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Schultz, Sarah M.; Jacobs, Michelle M.; Gorgos, Kara S.; Wasylyk, Nicole T.; Hanrahan, Sean; Van Lunen, Bonnie L. – Athletic Training Education Journal, 2015
Context: Accuracy of locating various lumbopelvic landmarks for novice athletic trainers has not been examined. Objective: To examine reliability of novice athletic trainers for identification of the L4 spinous process and right and left posterior superior iliac spine (PSIS). Design: Cross-sectional reliability. Setting: Laboratory. Patients or…
Descriptors: Athletics, Allied Health Personnel, Entry Workers, Reliability
Mailend, Marja-Liisa; Plante, Elena; Anderson, Michele A.; Applegate, E. Brooks; Nelson, Nickola W. – International Journal of Language & Communication Disorders, 2016
Background: As new standardized tests become commercially available, it is critical that clinicians have access to the information about a test's psychometric properties, including aspects of reliability. Aims: The purpose of the three studies reported in this article was to investigate the reliability of a new test, the Test of Integrated…
Descriptors: Standardized Tests, Psychometrics, Reliability, Language Skills
Han, Chao – Language Assessment Quarterly, 2016
As a property of test scores, reliability/dependability constitutes an important psychometric consideration, and it underpins the validity of measurement results. A review of interpreter certification performance tests (ICPTs) reveals that (a) although reliability/dependability checking has been recognized as an important concern, its theoretical…
Descriptors: Foreign Countries, Scores, English, Chinese
Monbaliu, E.; Ortibus, E.; Roelens, F.; Desloovere, K.; Deklerck, J.; Prinzie, P.; De Cock, P.; Feys, H. – Developmental Medicine & Child Neurology, 2010
Aim: This study investigated the reliability and validity of the Barry-Albright Dystonia Scale (BADS), the Burke-Fahn-Marsden Movement Scale (BFMMS), and the Unified Dystonia Rating Scale (UDRS) in patients with bilateral dystonic cerebral palsy (CP). Method: Three raters independently scored videotapes of 10 patients (five males, five females;…
Descriptors: Content Validity, Cerebral Palsy, Validity, Interrater Reliability
Webber, Larry; And Others – 1986
Generalizability theory, which subsumes classical measurement theory as a special case, provides a general model for estimating the reliability of observational rating data by estimating the variance components of the measurement design. Research data from the "Heart Smart" health intervention program were analyzed as a heuristic tool.…
Descriptors: Behavior Rating Scales, Cardiovascular System, Error of Measurement, Generalizability Theory
Rothman, M. L.; And Others – 1982
A practical application of generalizability theory, demonstrating how the variance components contribute to understanding and interpreting the data collected to evaluate a program, is described. The evaluation concerned 120 learning modules developed for the Dental Auxiliary Education Project. The goals of the project were to design, implement,…
Descriptors: Correlation, Data Collection, Dental Schools, Educational Research

Hollenbeck, Keith; Tindal, Gerald; Almond, Patricia – Educational Assessment, 1999
Studied the amount of measurement error in a state's performance-based writing task as it relates to high-stakes decision reproducibility. Using 175 eighth-grade writing samples, the study finds moderate correlations between the two raters' scores, with significant differences for the rates for the handwritten, but not the typed, essays.(SLD)
Descriptors: Decision Making, Error of Measurement, Essay Tests, Grade 8