Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 6 |
Descriptor
Interrater Reliability | 8 |
Statistical Data | 8 |
Evaluation Methods | 3 |
Rating Scales | 3 |
Statistical Analysis | 3 |
Data Collection | 2 |
Research Methodology | 2 |
Research Reports | 2 |
Scoring | 2 |
Advanced Placement Programs | 1 |
Athletics | 1 |
More ▼ |
Source
Journal of Educational and… | 1 |
Journal of Experimental… | 1 |
Journal of Policy Analysis… | 1 |
Measurement in Physical… | 1 |
Modern Language Journal | 1 |
Oxford Review of Education | 1 |
Psychometrika | 1 |
Regional Educational… | 1 |
Author
Albert, Adelin | 1 |
Ball, Samuel | 1 |
Bavier, Richard | 1 |
Boller, Kimberly | 1 |
Bonett, Douglas G. | 1 |
Cunningham, George B. | 1 |
Dixon, Marlene A. | 1 |
Ke, Chunaren | 1 |
Kisker, Ellen Eliason | 1 |
Louise Badham | 1 |
Marsh, Herbert W. | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Evaluative | 3 |
Reports - Research | 3 |
Reports - Descriptive | 2 |
Guides - Non-Classroom | 1 |
Education Level
Adult Education | 2 |
Early Childhood Education | 1 |
Elementary Education | 1 |
Grade 1 | 1 |
Grade 2 | 1 |
Grade 3 | 1 |
Primary Education | 1 |
Audience
Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Bonett, Douglas G. – Journal of Educational and Behavioral Statistics, 2022
The limitations of Cohen's ? are reviewed and an alternative G-index is recommended for assessing nominal-scale agreement. Maximum likelihood estimates, standard errors, and confidence intervals for a two-rater G-index are derived for one-group and two-group designs. A new G-index of agreement for multirater designs is proposed. Statistical…
Descriptors: Statistical Inference, Statistical Data, Interrater Reliability, Design
Louise Badham – Oxford Review of Education, 2025
Different sources of assessment evidence are reviewed during International Baccalaureate (IB) grade awarding to convert marks into grades and ensure fair results for students. Qualitative and quantitative evidence are analysed to determine grade boundaries, with statistical evidence weighed against examiner judgement and teachers' feedback on…
Descriptors: Advanced Placement Programs, Grading, Interrater Reliability, Evaluative Thinking
Vanbelle, Sophie; Albert, Adelin – Psychometrika, 2009
We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen's kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the…
Descriptors: Interrater Reliability, Weighted Scores, Congruence (Psychology), Rating Scales
Boller, Kimberly; Kisker, Ellen Eliason – Regional Educational Laboratory, 2014
This guide is designed to help researchers make sure that their research reports include enough information about study measures so that readers can assess the quality of the study's methods and results. The guide also provides examples of write-ups about measures and suggests resources for learning more about these topics. The guide assumes…
Descriptors: Research Reports, Research Methodology, Educational Research, Check Lists
Bavier, Richard – Journal of Policy Analysis and Management, 2008
A recent series of papers has renewed interest in the question of whether consumption data are superior to income data for poverty measurement. Although the Census Bureau has provided researchers with an experimental series of variables that can produce a comprehensive income measure, this resource has not been fully exploited in previous…
Descriptors: Poverty, Income, Money Management, Consumer Economics

Marsh, Herbert W.; Ball, Samuel – Journal of Experimental Education, 1989
Agreement between two independent reviews of each of 278 manuscripts was compared on an overall recommendation and on specific rating items. Agreement between reviewers on separate dimensions, the unweighted sum of the dimensions, and various weighted sums was no better than that for the overall recommendation itself. (SLD)
Descriptors: Evaluation Methods, Factor Analysis, Interrater Reliability, Manuscripts

Ke, Chunaren – Modern Language Journal, 1996
Investigated the relationship between Chinese character recognition and production by second-language learners. Subjects were 47 first-year Chinese language students in the United States. (15 references) (Author/CK)
Descriptors: Chinese, College Students, Data Collection, Ideography
Dixon, Marlene A.; Cunningham, George B. – Measurement in Physical Education and Exercise Science, 2006
Understanding that the behavior of people takes place within a context, over the past 20 years research in education and the sport sciences has witnessed an increasing development of multilevel frameworks that are both conceptually and methodologically sound. Despite these advances, the use of multilevel models and research designs in education…
Descriptors: Physical Activities, Statistical Data, Statistical Studies, Statistical Analysis