ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	11

Descriptor

Error of Measurement	14
Interrater Reliability	14
Statistical Analysis	14
Correlation	5
Accuracy	4
Scoring Rubrics	4
Comparative Analysis	3
Foreign Countries	3
Generalizability Theory	3
Higher Education	3
Item Response Theory	3
Reliability	3
Classification	2
Design	2
Elementary School Students	2
Rating Scales	2
Reading Achievement	2
Scores	2
Scoring	2
Statistical Bias	2
Test Items	2
Test Reliability	2
Adults	1
Advanced Placement	1
Beginning Reading	1
More ▼

Source

Educational and Psychological…	3
Measurement in Physical…	2
Applied Measurement in…	1
Creativity Research Journal	1
Journal of Educational…	1
ProQuest LLC	1
Psicologica: International…	1
Research Synthesis Methods	1

Publication Type

Journal Articles	10
Reports - Research	9
Reports - Evaluative	4
Speeches/Meeting Papers	2
Dissertations/Theses -…	1

Education Level

Elementary Education	2
Grade 1	1
Higher Education	1
Postsecondary Education	1

Audience

Researchers

Location

Japan	1
Netherlands (Amsterdam)	1
Taiwan (Taipei)	1

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…

What Works Clearinghouse Rating

Showing all 14 results Save | Export

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Kappa Coefficients for Missing Data

Peer reviewed

Direct link

De Raadt, Alexandra; Warrens, Matthijs J.; Bosker, Roel J.; Kiers, Henk A. L. – Educational and Psychological Measurement, 2019

Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data…

Descriptors: Interrater Reliability, Data, Statistical Analysis, Statistical Bias

Kappa and Rater Accuracy: Paradigms and Parameters

Peer reviewed

Direct link

Conger, Anthony J. – Educational and Psychological Measurement, 2017

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…

Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis

An Unbiased Estimate of Global Interrater Agreement

Peer reviewed

Direct link

Cousineau, Denis; Laurencelle, Louis – Educational and Psychological Measurement, 2017

Assessing global interrater agreement is difficult as most published indices are affected by the presence of mixtures of agreements and disagreements. A previously proposed method was shown to be specifically sensitive to global agreement, excluding mixtures, but also negatively biased. Here, we propose two alternatives in an attempt to find what…

Descriptors: Interrater Reliability, Evaluation Methods, Statistical Bias, Accuracy

Estimating Hazard Ratios from Published Kaplan-Meier Survival Curves: A Methods Validation Study

Peer reviewed

Direct link

Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019

Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…

Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials

Intra- and Inter-Rater Reliability of the Rate of Force Development of Hip Abductor Muscles Measured by Hand-Held Dynamometer

Peer reviewed

Direct link

Takeda, Kazuya; Tanabe, Shigeo; Koyama, Soichiro; Nagai, Tomoko; Sakurai, Hiroaki; Kanada, Yoshikiyo; Shomoto, Koji – Measurement in Physical Education and Exercise Science, 2018

The aim of this study was to clarify the intra- and inter-rater reliability of the rate of force development in hip abductor muscle force measurements using a hand-held dynamometer. Thirty healthy adults were separately assessed by two independent raters on two separate days. Rate of force development was calculated from the slope of the…

Descriptors: Interrater Reliability, Human Body, Measurement Equipment, Handheld Devices

Inter-Rater and Test-Retest (Between-Sessions) Reliability of the 4-Skills Scan for Dutch Elementary School Children

Peer reviewed

Direct link

van Kernebeek, Willem G.; de Schipper, Antoine W.; Savelsbergh, Geert J. P.; Toussaint, Huub M. – Measurement in Physical Education and Exercise Science, 2018

In The Netherlands, the 4-Skills Scan is an instrument for physical education teachers to assess gross motor skills of elementary school children. Little is known about its reliability. Therefore, in this study the test-retest and inter-rater reliability was determined. Respectively, 624 and 557 Dutch 6- to 12-year-old children were analyzed for…

Descriptors: Foreign Countries, Interrater Reliability, Pretests Posttests, Psychomotor Skills

Analysis of Rater Severity on Written Expression Exam Using Many Faceted Rasch Measurement

Peer reviewed
PDF on ERIC

Download full text

Prieto, Gerardo; Nieto, Eloísa – Psicologica: International Journal of Methodology and Experimental Psychology, 2014

This paper describes how a Many Faceted Rasch Measurement (MFRM) approach can be applied to performance assessment focusing on rater analysis. The article provides an introduction to MFRM, a description of MFRM analysis procedures, and an example to illustrate how to examine the effects of various sources of variability on test takers' performance…

Descriptors: Item Response Theory, Interrater Reliability, Rating Scales, Error of Measurement

Oral Performace Scoring Using Generalizability Theory and Many-Facet Rasch Measurement: A Comparison Study

Direct link

Alkahtani, Saif F. – ProQuest LLC, 2012

The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent…

Descriptors: Generalizability Theory, Item Response Theory, Verbal Tests, Islam

Improving Creativity Performance Assessment: A Rater Effect Examination with Many Facet Rasch Model

Peer reviewed

Direct link

Hung, Su-Pin; Chen, Po-Hsi; Chen, Hsueh-Chih – Creativity Research Journal, 2012

Product assessment is widely applied in creative studies, typically as an important dependent measure. Within this context, this study had 2 purposes. First, the focus of this research was on methods for investigating possible rater effects, an issue that has not received a great deal of attention in past creativity studies. Second, the…

Descriptors: Item Response Theory, Creativity, Interrater Reliability, Undergraduate Students

Reliability of Advanced Placement Examinations.

Download full text

Bridgeman, Brent; And Others – 1996

The various methods for computing the reliability of scores on Advanced Placement (AP) examinations are summarized. For the free response portion of the examinations, raters can contribute to score unreliability through both systematic severity errors (in which some raters consistently rate more severely than other raters) and through…

Descriptors: Advanced Placement, College Entrance Examinations, Error of Measurement, High School Students

A Generalizability Study of the Angoff Method Applied to Setting Cutoff Scores of Professional Certification Tests.

Cope, Ronald T. – 1987

This study used generalizability theory and other statistical concepts to assess the application of the Angoff method to setting cutoff scores on two professional certification tests. A panel of ten judges gave pre- and post-feedback Angoff probability ratings of items of two forms of a professional certification test, and another panel of nine…

Descriptors: Certification, Correlation, Cutting Scores, Error of Measurement

Examining the Reliability of Running Records: Attaining Generalizable Results

Peer reviewed

Direct link

Fawson, Parker C.; Ludlow, Brian C.; Reutzel, D. Ray; Sudweeks, Richard; Smith, John A. – Journal of Educational Research, 2006

The authors present results of a generalizability study of running record assessment. They conducted 2 decision studies to ascertain the number of raters and passages necessary to obtain a reliable estimate of a student's reading ability on the basis of a running record assessment. Ten teachers completed running record assessments of 10…

Descriptors: Reading Ability, Generalizability Theory, Reading Instruction, Error of Measurement

Generalizability Theory in Program Evaluation.

Rothman, M. L.; And Others – 1982

A practical application of generalizability theory, demonstrating how the variance components contribute to understanding and interpreting the data collected to evaluate a program, is described. The evaluation concerned 120 learning modules developed for the Dental Auxiliary Education Project. The goals of the project were to design, implement,…

Descriptors: Correlation, Data Collection, Dental Schools, Educational Research

Alkahtani, Saif F.	1
Bosker, Roel J.	1
Bridgeman, Brent	1
Carol Eckerly	1
Chan, Kelvin K. W.	1
Chen, Hsueh-Chih	1
Chen, Po-Hsi	1
Cheng, Sierra	1
Conger, Anthony J.	1
Cope, Ronald T.	1
Cousineau, Denis	1
De Raadt, Alexandra	1
Fawson, Parker C.	1
Hung, Su-Pin	1
John R. Donoghue	1
Kanada, Yoshikiyo	1
Kiers, Henk A. L.	1
Koyama, Soichiro	1
Laurencelle, Louis	1
Ludlow, Brian C.	1
Nagai, Tomoko	1
Nieto, Eloísa	1
Prieto, Gerardo	1
Reutzel, D. Ray	1
More ▼