ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	1

Descriptor

Interrater Reliability	6
Scores	4
Scoring	4
Scoring Rubrics	3
Essays	2
High School Students	2
High Schools	2
Writing Tests	2
Academic Achievement	1
Correlation	1
Criteria	1
Essay Tests	1
Evaluation	1
Evaluation Methods	1
Family Literacy	1
Generalizability Theory	1
Generalization	1
Holistic Approach	1
Language Tests	1
Measurement Techniques	1
Models	1
Monte Carlo Methods	1
Portfolio Assessment	1
Portfolios (Background…	1
Program Evaluation	1
More ▼

Source

Applied Measurement in…	2
Language Assessment Quarterly	2
American Journal of Evaluation	1
Journal of Experimental…	1

Author

Johnson, Robert L.	6
Gordon, Belita	3
Penny, James	2
Penny, Jim	2
Fisher, Steve	1
Fisher, Steven P.	1
Hodge, Kari J.	1
Kuhs, Therese	1
McDaniel, Fred, II	1
Morgan, Grant B.	1
Shumate, Steven R.	1
Willeke, Marjorie J.	1
Zhu, Min	1
More ▼

Publication Type

Journal Articles	6
Reports - Research	6

Education Level

Audience

Location

Georgia

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 6 results Save | Export

Interrater Reliability Estimators Commonly Used in Scoring Language Assessments: A Monte Carlo Investigation of Estimator Accuracy

Peer reviewed

Direct link

Morgan, Grant B.; Zhu, Min; Johnson, Robert L.; Hodge, Kari J. – Language Assessment Quarterly, 2014

Common estimators of interrater reliability include Pearson product-moment correlation coefficients, Spearman rank-order correlations, and the generalizability coefficient. The purpose of this study was to examine the accuracy of estimators of interrater reliability when varying the true reliability, number of scale categories, and number of…

Descriptors: Interrater Reliability, Correlation, Generalization, Scoring

Using Rating Augmentation To Expand the Scale of an Analytic Rubric.

Peer reviewed

Penny, Jim; Johnson, Robert L.; Gordon, Belita – Journal of Experimental Education, 2000

Used an analytic rubric to score 120 writing samples from Georgia's 11th grade writing assessment. Raters augmented scores by adding a "+" or "-" to the score. Results indicate that this method of augmentation tends to improve most indices of interrater reliability, although the percentage of exact and adjacent agreement…

Descriptors: High School Students, High Schools, Interrater Reliability, Scoring Rubrics

The Relation between Score Resolution Methods and Interrater Reliability: An Empirical Study of an Analytic Scoring Rubric.

Peer reviewed

Johnson, Robert L.; Penny, James; Gordon, Belita – Applied Measurement in Education, 2000

Studied four forms of score resolution used by testing agencies and investigated the effect that each has on the interrater reliability associated with the resulting operational scores. Results, based on 120 essays from the Georgia High School Writing Test, show some forms of resolution to be associated with higher reliability and some associated…

Descriptors: Essay Tests, High School Students, High Schools, Interrater Reliability

Using Portfolios in Program Evaluation: An Investigation of Interrater Reliability.

Peer reviewed

Johnson, Robert L.; McDaniel, Fred, II; Willeke, Marjorie J. – American Journal of Evaluation, 2000

Studied the interrater reliability of a portfolio assessment used in a small-scale program evaluation. Investigated analytic, combined analytic, and holistic family literacy portfolios from an Even Start program. Results show that at least three raters are needed to obtain acceptable levels of reliability for holistic and individual analytic…

Descriptors: Family Literacy, Holistic Approach, Interrater Reliability, Portfolio Assessment

Score Resolution: An Investigation of the Reliability and Validity of Resolved Scores

Peer reviewed

Direct link

Johnson, Robert L.; Penny, Jim; Fisher, Steve; Kuhs, Therese – Applied Measurement in Education, 2003

When raters assign different scores to a performance task, a method for resolving rating differences is required to report a single score to the examinee. Recent studies indicate that decisions about examinees, such as pass/fail decisions, differ across resolution methods. Previous studies also investigated the interrater reliability of…

Descriptors: Test Reliability, Test Validity, Scores, Interrater Reliability

Resolving Score Differences in the Rating of Writing Samples: Does Discussion Improve the Accuracy of Scores?

Peer reviewed

Direct link

Johnson, Robert L.; Penny, James; Gordon, Belita; Shumate, Steven R.; Fisher, Steven P. – Language Assessment Quarterly, 2005

Many studies have indicated that at least 2 raters should score writing assessments to improve interrater reliability. However, even for assessments that characteristically demonstrate high levels of rater agreement, 2 raters of the same essay can occasionally report different, or discrepant, scores. If a single score, typically referred to as an…

Descriptors: Interrater Reliability, Scores, Evaluation, Reliability