Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 5 |
Descriptor
Interrater Reliability | 7 |
Statistical Analysis | 7 |
Computation | 2 |
Foreign Countries | 2 |
Inquiry | 2 |
Rating Scales | 2 |
Research Methodology | 2 |
Test Reliability | 2 |
Affective Objectives | 1 |
Autism | 1 |
Behavioral Science Research | 1 |
More ▼ |
Source
Educational and Psychological… | 2 |
International Journal of… | 1 |
Journal of Mixed Methods… | 1 |
Regional Educational… | 1 |
Thought Currents in English… | 1 |
Turkish Online Journal of… | 1 |
Author
Publication Type
Reports - Descriptive | 7 |
Journal Articles | 6 |
Guides - Non-Classroom | 1 |
Numerical/Quantitative Data | 1 |
Education Level
Early Childhood Education | 1 |
Elementary Education | 1 |
Grade 1 | 1 |
Grade 2 | 1 |
Grade 3 | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Primary Education | 1 |
Audience
Researchers | 2 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 1 |
Test of English for… | 1 |
What Works Clearinghouse Rating
Gwet, Kilem L. – Educational and Psychological Measurement, 2021
Cohen's kappa coefficient was originally proposed for two raters only, and it later extended to an arbitrarily large number of raters to become what is known as Fleiss' generalized kappa. Fleiss' generalized kappa and its large-sample variance are still widely used by researchers and were implemented in several software packages, including, among…
Descriptors: Sample Size, Statistical Analysis, Interrater Reliability, Computation
Mustapha, Aida; Samsudin, Noor Azah; Arbaiy, Nurieze; Mohammed, Rozlini; Hamid, Isredza Rahmi – Turkish Online Journal of Educational Technology - TOJET, 2016
In programming, one problem can usually be solved using different logics and constructs but still producing the same output. Sometimes students get marked down inappropriately if their solutions do not follow the answer scheme. In addition, lab exercises and programming assignments are not necessary graded by the instructors but most of the time…
Descriptors: Programming, Computer Science Education, Scoring Rubrics, Grading
Seltzer-Kelly, Deborah; Westwood, Sean J.; Pena-Guzman, David M. – Journal of Mixed Methods Research, 2012
This inquiry developed during the process of "quantitizing" qualitative data the authors had gathered for a mixed methods curriculum efficacy study. Rather than providing the intended rigor to their data coding process, their use of an intercoder reliability metric prompted their investigation of the multiplicity and messiness that, as they…
Descriptors: Mixed Methods Research, Curriculum Research, Interrater Reliability, Research Methodology
Boller, Kimberly; Kisker, Ellen Eliason – Regional Educational Laboratory, 2014
This guide is designed to help researchers make sure that their research reports include enough information about study measures so that readers can assess the quality of the study's methods and results. The guide also provides examples of write-ups about measures and suggests resources for learning more about these topics. The guide assumes…
Descriptors: Research Reports, Research Methodology, Educational Research, Check Lists
Nieminen, Timo A.; Choi, Serene Hyun-Jin – International Journal of Research & Method in Education, 2008
Quantitative behaviour analysis requires the classification of behaviour to produce the basic data. This can be challenging when the theoretical taxonomy does not match observational limitations, or if a theoretical taxonomy is unavailable. Binary keys allow qualitative observation to be used to modify a theoretical taxonomy to produce a practical…
Descriptors: Developmental Disabilities, Behavioral Science Research, Classification, Identification
Schuster, Christof – Educational and Psychological Measurement, 2004
This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absolute agreement measure in the sense that it is sensitive to differences in rater's marginal distributions. Specifically, rater mean differences will decrease…
Descriptors: Computation, Rating Scales, Interrater Reliability, Statistical Analysis
Strong, Gregory – Thought Currents in English Literature, 1995
This paper traces developments in educational psychology and measurement that led to the Test of English as a Foreign Language (TOEFL) and the test of English for International Communication (TOEIC) and the application of educational measurement terms such as validity and reliability to testing. Use of a table of specifications for planning…
Descriptors: Cloze Procedure, Difficulty Level, English (Second Language), Foreign Countries