ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	105
Since 2006 (last 20 years)	229

Descriptor

Interrater Reliability	268
Statistical Analysis	268
Foreign Countries	74
Correlation	69
Comparative Analysis	53
Evaluation Methods	38
Measures (Individuals)	37
Scores	37
Reliability	34
Second Language Learning	34
Teaching Methods	33
Validity	31
English (Second Language)	30
Test Reliability	29
Pretests Posttests	28
Questionnaires	27
Scoring Rubrics	27
Evaluators	26
College Students	24
Second Language Instruction	23
Elementary School Students	22
Student Attitudes	22
Intervention	21
Test Validity	21
Coding	20
More ▼

Publication Type

Journal Articles	233
Reports - Research	210
Reports - Evaluative	36
Tests/Questionnaires	23
Information Analyses	12
Speeches/Meeting Papers	12
Dissertations/Theses -…	11
Reports - Descriptive	7
Opinion Papers	3
Guides - Non-Classroom	2
Numerical/Quantitative Data	2
Books	1
Collected Works - General	1
More ▼

Education Level

Higher Education	79
Postsecondary Education	61
Elementary Education	30
Secondary Education	22
Early Childhood Education	15
Elementary Secondary Education	13
Middle Schools	13
High Schools	9
Junior High Schools	8
Grade 8	7
Primary Education	7
Preschool Education	6
Grade 1	5
Grade 2	5
Grade 4	5
Grade 5	5
Grade 7	5
Intermediate Grades	5
Kindergarten	5
Grade 3	4
Grade 6	4
Adult Education	3
Grade 10	2
Grade 11	1
Two Year Colleges	1
More ▼

Audience

Researchers	5
Practitioners	3
Administrators	1
Teachers	1

Location

Netherlands	10
Iran	6
Turkey	6
China	5
Japan	5
Canada	4
Germany	4
Taiwan	4
United Kingdom	4
Cyprus	3
New Zealand	3
North Carolina	3
Ohio	3
Sweden	3
Arizona	2
California	2
Finland	2
Florida	2
Georgia	2
Greece	2
Hong Kong	2
Israel	2
Malaysia	2
Pennsylvania	2
Saudi Arabia	2
More ▼

Laws, Policies, & Programs

Individuals with Disabilities…

What Works Clearinghouse Rating

Does not meet standards

Showing 1 to 15 of 268 results Save | Export

Evaluating the Correspondence between Expert Visual Analysis and Quantitative Methods

Peer reviewed

Direct link

Alexandra M. Pierce; Lisa M. H. Sanetti; Melissa A. Collier-Meek; Austin H. Johnson – Grantee Submission, 2024

Visual analysis is the primary methodology used to determine treatment effects from graphed single-case design data. Previous studies have demonstrated mixed findings related to interrater agreement between both expert and novice visual analysts, which represents a critical limitation of visual analysis and supports calls for also presenting…

Descriptors: Graphs, Interrater Reliability, Statistical Analysis, Expertise

Agree to Disagree: Multiple Methods to Assess Rater Agreement during Student Teaching

Peer reviewed

Direct link

Elayne P. Colón; Lori M. Dassa; Thomas M. Dana; Nathan P. Hanson – Action in Teacher Education, 2024

To meet accreditation expectations, teacher preparation programs must demonstrate their candidates are evaluated using summative assessment tools that yield sound, reliable, and valid data. These tools are primarily used by the clinical experience team -- university supervisors and mentor teachers. Institutional beliefs regarding best practices…

Descriptors: Student Teachers, Teacher Interns, Evaluation Methods, Interrater Reliability

Reliability Evidence for the NC Teacher Evaluation Process Using a Variety of Indicators of Inter-Rater Agreement

Peer reviewed
PDF on ERIC

Download full text

Holcomb, T. Scott; Lambert, Richard; Bottoms, Bryndle L. – Journal of Educational Supervision, 2022

In this study, various statistical indexes of agreement were calculated using empirical data from a group of evaluators (n = 45) of early childhood teachers. The group of evaluators rated ten fictitious teacher profiles using the North Carolina Teacher Evaluation Process (NCTEP) rubric. The exact and adjacent agreement percentages were calculated…

Descriptors: Interrater Reliability, Teacher Evaluation, Statistical Analysis, Early Childhood Teachers

Large-Sample Variance of Fleiss Generalized Kappa

Peer reviewed

Direct link

Gwet, Kilem L. – Educational and Psychological Measurement, 2021

Cohen's kappa coefficient was originally proposed for two raters only, and it later extended to an arbitrarily large number of raters to become what is known as Fleiss' generalized kappa. Fleiss' generalized kappa and its large-sample variance are still widely used by researchers and were implemented in several software packages, including, among…

Descriptors: Sample Size, Statistical Analysis, Interrater Reliability, Computation

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

A Comparison of Manual versus Automated Quantitative Production Analysis of Connected Speech

Peer reviewed

Direct link

Fromm, Davida; Katta, Saketh; Paccione, Mason; Hecht, Sophia; Greenhouse, Joel; MacWhinney, Brian; Schnur, Tatiana T. – Journal of Speech, Language, and Hearing Research, 2021

Purpose: Analysis of connected speech in the field of adult neurogenic communication disorders is essential for research and clinical purposes, yet time and expertise are often cited as limiting factors. The purpose of this project was to create and evaluate an automated program to score and compute the measures from the Quantitative Production…

Descriptors: Speech, Automation, Statistical Analysis, Adults

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Peer reviewed

Direct link

Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…

Descriptors: Test Bias, Interrater Reliability, Responses, Correlation

Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021

Peer reviewed

Direct link

Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023

Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…

Descriptors: Chemistry, Periodicals, Journal Articles, Science Education

A Nonparametric Procedure for Exploring Differences in Rating Quality across Test-Taker Subgroups in Rater-Mediated Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2019

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…

Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests

Kappa Coefficients for Missing Data

Peer reviewed

Direct link

De Raadt, Alexandra; Warrens, Matthijs J.; Bosker, Roel J.; Kiers, Henk A. L. – Educational and Psychological Measurement, 2019

Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data…

Descriptors: Interrater Reliability, Data, Statistical Analysis, Statistical Bias

Methodologies for Investigating and Interpreting Student-Teacher Rating Incongruence in Noncognitive Assessment

Peer reviewed

Direct link

Flake, Jessica Kay; Petway, Kevin Terrance, II – Educational Measurement: Issues and Practice, 2019

Numerous studies merely note divergence in students' and teachers' ratings of student noncognitive constructs. However, given the increased attention and use of these constructs in educational research and practice, an in-depth study focused on this issue was needed. Using a variety of quantitative methodologies, we thoroughly investigate…

Descriptors: Teachers, Students, Achievement Rating, Interrater Reliability

Does Comparative Judgement of Scripts Provide an Effective Means of Maintaining Standards in Mathematics? Research Report

Download full text

Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020

In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…

Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level

Exploring Incomplete Rating Designs with Mokken Scale Analysis

Peer reviewed

Direct link

Wind, Stefanie A.; Patil, Yogendra J. – Educational and Psychological Measurement, 2018

Recent research has explored the use of models adapted from Mokken scale analysis as a nonparametric approach to evaluating rating quality in educational performance assessments. A potential limiting factor to the widespread use of these techniques is the requirement for complete data, as practical constraints in operational assessment systems…

Descriptors: Scaling, Data, Interrater Reliability, Writing Tests

Assessment of Interrater and Intermethod Agreement in the Kinesiology Literature

Peer reviewed

Direct link

Looney, Marilyn A. – Measurement in Physical Education and Exercise Science, 2018

The purpose of this article was two-fold (1) provide an overview of the commonly reported and under-reported absolute agreement indices in the kinesiology literature for continuous data; and (2) present examples of these indices for hypothetical data along with recommendations for future use. It is recommended that three types of information be…

Descriptors: Interrater Reliability, Evaluation Methods, Kinetics, Indexes

An Alternative Method Used in Evaluating Agreement among Repeat Measurements by Two Raters in Education

Peer reviewed
PDF on ERIC

Download full text

Erdogan, Semra; Orekici Temel, Gülhan; Selvi, Hüseyin; Ersöz Kaya, Irem – Educational Sciences: Theory and Practice, 2017

Taking more than one measurement of the same variable also hosts the possibility of contamination from error sources, both singly and in combination as a result of interactions. Therefore, although the internal consistency of scores received from measurement tools is examined by itself, it is necessary to ensure interrater or intra-rater agreement…

Descriptors: Measurement, Interrater Reliability, Repetition, Statistical Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 18

Educational and Psychological…	14
ProQuest LLC	11
Online Submission	8
Language Testing	7
English Language Teaching	6
Applied Measurement in…	5
Measurement in Physical…	5
Early Child Development and…	4
Grantee Submission	4
Journal of Positive Behavior…	4
Assessment & Evaluation in…	3
CBE - Life Sciences Education	3
Creativity Research Journal	3
Journal of Speech, Language,…	3
Language Assessment Quarterly	3
Action in Teacher Education	2
Advances in Health Sciences…	2
Advances in Language and…	2
American Journal of Distance…	2
Assessment for Effective…	2
Autism: The International…	2
British Journal of Guidance &…	2
Computers in the Schools	2
ETS Research Report Series	2
Early Education and…	2
More ▼

Coniam, David	3
Wind, Stefanie A.	3
Bahreini, Kiavash	2
Beach, Kristen D.	2
Bocian, Kathleen M.	2
Bodur, Yasar	2
Buitelaar, Jan K.	2
Cousineau, Denis	2
Iramaneerat, Cherdsak	2
Kilgus, Stephen P.	2
Kratz, Hilary E.	2
Laurencelle, Louis	2
Li, Weidong	2
Locke, Jill	2
Mandell, David S.	2
McWilliam, R. A.	2
Nadolski, Rob	2
O'Connor, Rollanda E.	2
Ouellette, Rachel R.	2
Piotrowski, Zinnia	2
Prevost, Luanna B.	2
Rommelse, Nanda N. J.	2
Sadler, Philip M.	2
Stahmer, Aubyn C.	2
More ▼

Test of English as a Foreign…	5
Woodcock Johnson Tests of…	4
Dynamic Indicators of Basic…	3
Strengths and Difficulties…	3
Advanced Placement…	2
Autism Diagnostic Observation…	2
Child Behavior Checklist	2
Early Childhood Environment…	2
Flesch Kincaid Grade Level…	2
Gray Oral Reading Test	2
Maslach Burnout Inventory	2
Woodcock Reading Mastery Test	2
ACT Assessment	1
Behavior Assessment System…	1
Bracken Basic Concept Scale	1
Draw a Person Test	1
Kaufman Assessment Battery…	1
MacArthur Communicative…	1
Mullen Scales of Early…	1
Multifactor Leadership…	1
Obsessive Compulsive Scale	1
Reading Miscue Inventory	1
SAT (College Admission Test)	1
Student Teacher Relationship…	1
Study Process Questionnaire	1
More ▼