Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 1 |
Descriptor
Interrater Reliability | 6 |
Scores | 4 |
Scoring | 4 |
Scoring Rubrics | 3 |
Essays | 2 |
High School Students | 2 |
High Schools | 2 |
Writing Tests | 2 |
Academic Achievement | 1 |
Correlation | 1 |
Criteria | 1 |
More ▼ |
Source
Applied Measurement in… | 2 |
Language Assessment Quarterly | 2 |
American Journal of Evaluation | 1 |
Journal of Experimental… | 1 |
Author
Johnson, Robert L. | 6 |
Gordon, Belita | 3 |
Penny, James | 2 |
Penny, Jim | 2 |
Fisher, Steve | 1 |
Fisher, Steven P. | 1 |
Hodge, Kari J. | 1 |
Kuhs, Therese | 1 |
McDaniel, Fred, II | 1 |
Morgan, Grant B. | 1 |
Shumate, Steven R. | 1 |
More ▼ |
Publication Type
Journal Articles | 6 |
Reports - Research | 6 |
Education Level
Audience
Location
Georgia | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Morgan, Grant B.; Zhu, Min; Johnson, Robert L.; Hodge, Kari J. – Language Assessment Quarterly, 2014
Common estimators of interrater reliability include Pearson product-moment correlation coefficients, Spearman rank-order correlations, and the generalizability coefficient. The purpose of this study was to examine the accuracy of estimators of interrater reliability when varying the true reliability, number of scale categories, and number of…
Descriptors: Interrater Reliability, Correlation, Generalization, Scoring

Penny, Jim; Johnson, Robert L.; Gordon, Belita – Journal of Experimental Education, 2000
Used an analytic rubric to score 120 writing samples from Georgia's 11th grade writing assessment. Raters augmented scores by adding a "+" or "-" to the score. Results indicate that this method of augmentation tends to improve most indices of interrater reliability, although the percentage of exact and adjacent agreement…
Descriptors: High School Students, High Schools, Interrater Reliability, Scoring Rubrics

Johnson, Robert L.; Penny, James; Gordon, Belita – Applied Measurement in Education, 2000
Studied four forms of score resolution used by testing agencies and investigated the effect that each has on the interrater reliability associated with the resulting operational scores. Results, based on 120 essays from the Georgia High School Writing Test, show some forms of resolution to be associated with higher reliability and some associated…
Descriptors: Essay Tests, High School Students, High Schools, Interrater Reliability

Johnson, Robert L.; McDaniel, Fred, II; Willeke, Marjorie J. – American Journal of Evaluation, 2000
Studied the interrater reliability of a portfolio assessment used in a small-scale program evaluation. Investigated analytic, combined analytic, and holistic family literacy portfolios from an Even Start program. Results show that at least three raters are needed to obtain acceptable levels of reliability for holistic and individual analytic…
Descriptors: Family Literacy, Holistic Approach, Interrater Reliability, Portfolio Assessment
Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003
When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…
Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability
Johnson, Robert L.; Penny, James; Gordon, Belita; Shumate, Steven R.; Fisher, Steven P. – Language Assessment Quarterly, 2005
Many studies have indicated that at least 2 raters should score writing assessments to improve interrater reliability. However, even for assessments that characteristically demonstrate high levels of rater agreement, 2 raters of the same essay can occasionally report different, or discrepant, scores. If a single score, typically referred to as an…
Descriptors: Interrater Reliability, Scores, Evaluation, Reliability