ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	105
Since 2006 (last 20 years)	229

Descriptor

Interrater Reliability	268
Statistical Analysis	268
Foreign Countries	74
Correlation	69
Comparative Analysis	53
Evaluation Methods	38
Measures (Individuals)	37
Scores	37
Reliability	34
Second Language Learning	34
Teaching Methods	33
Validity	31
English (Second Language)	30
Test Reliability	29
Pretests Posttests	28
Questionnaires	27
Scoring Rubrics	27
Evaluators	26
College Students	24
Second Language Instruction	23
Elementary School Students	22
Student Attitudes	22
Intervention	21
Test Validity	21
Coding	20
More ▼

Publication Type

Journal Articles	233
Reports - Research	210
Reports - Evaluative	36
Tests/Questionnaires	23
Information Analyses	12
Speeches/Meeting Papers	12
Dissertations/Theses -…	11
Reports - Descriptive	7
Opinion Papers	3
Guides - Non-Classroom	2
Numerical/Quantitative Data	2
Books	1
Collected Works - General	1
More ▼

Education Level

Higher Education	79
Postsecondary Education	61
Elementary Education	30
Secondary Education	22
Early Childhood Education	15
Elementary Secondary Education	13
Middle Schools	13
High Schools	9
Junior High Schools	8
Grade 8	7
Primary Education	7
Preschool Education	6
Grade 1	5
Grade 2	5
Grade 4	5
Grade 5	5
Grade 7	5
Intermediate Grades	5
Kindergarten	5
Grade 3	4
Grade 6	4
Adult Education	3
Grade 10	2
Grade 11	1
Two Year Colleges	1
More ▼

Audience

Researchers	5
Practitioners	3
Administrators	1
Teachers	1

Location

Netherlands	10
Iran	6
Turkey	6
China	5
Japan	5
Canada	4
Germany	4
Taiwan	4
United Kingdom	4
Cyprus	3
New Zealand	3
North Carolina	3
Ohio	3
Sweden	3
Arizona	2
California	2
Finland	2
Florida	2
Georgia	2
Greece	2
Hong Kong	2
Israel	2
Malaysia	2
Pennsylvania	2
Saudi Arabia	2
More ▼

Laws, Policies, & Programs

Individuals with Disabilities…

What Works Clearinghouse Rating

Does not meet standards

Showing 1 to 15 of 268 results Save | Export

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Evaluating the Correspondence between Expert Visual Analysis and Quantitative Methods

Peer reviewed

Direct link

Alexandra M. Pierce; Lisa M. H. Sanetti; Melissa A. Collier-Meek; Austin H. Johnson – Grantee Submission, 2024

Visual analysis is the primary methodology used to determine treatment effects from graphed single-case design data. Previous studies have demonstrated mixed findings related to interrater agreement between both expert and novice visual analysts, which represents a critical limitation of visual analysis and supports calls for also presenting…

Descriptors: Graphs, Interrater Reliability, Statistical Analysis, Expertise

Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021

Peer reviewed

Direct link

Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023

Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…

Descriptors: Chemistry, Periodicals, Journal Articles, Science Education

Agree to Disagree: Multiple Methods to Assess Rater Agreement during Student Teaching

Peer reviewed

Direct link

Elayne P. Colón; Lori M. Dassa; Thomas M. Dana; Nathan P. Hanson – Action in Teacher Education, 2024

To meet accreditation expectations, teacher preparation programs must demonstrate their candidates are evaluated using summative assessment tools that yield sound, reliable, and valid data. These tools are primarily used by the clinical experience team -- university supervisors and mentor teachers. Institutional beliefs regarding best practices…

Descriptors: Student Teachers, Teacher Interns, Evaluation Methods, Interrater Reliability

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Peer reviewed

Direct link

Walker, Cindy M.; Göçer Sahin, Sakine – Educational and Psychological Measurement, 2020

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared…

Descriptors: Test Bias, Interrater Reliability, Responses, Correlation

Reliability Evidence for the NC Teacher Evaluation Process Using a Variety of Indicators of Inter-Rater Agreement

Peer reviewed
PDF on ERIC

Download full text

Holcomb, T. Scott; Lambert, Richard; Bottoms, Bryndle L. – Journal of Educational Supervision, 2022

In this study, various statistical indexes of agreement were calculated using empirical data from a group of evaluators (n = 45) of early childhood teachers. The group of evaluators rated ten fictitious teacher profiles using the North Carolina Teacher Evaluation Process (NCTEP) rubric. The exact and adjacent agreement percentages were calculated…

Descriptors: Interrater Reliability, Teacher Evaluation, Statistical Analysis, Early Childhood Teachers

Large-Sample Variance of Fleiss Generalized Kappa

Peer reviewed

Direct link

Gwet, Kilem L. – Educational and Psychological Measurement, 2021

Cohen's kappa coefficient was originally proposed for two raters only, and it later extended to an arbitrarily large number of raters to become what is known as Fleiss' generalized kappa. Fleiss' generalized kappa and its large-sample variance are still widely used by researchers and were implemented in several software packages, including, among…

Descriptors: Sample Size, Statistical Analysis, Interrater Reliability, Computation

A Comparison of Manual versus Automated Quantitative Production Analysis of Connected Speech

Peer reviewed

Direct link

Fromm, Davida; Katta, Saketh; Paccione, Mason; Hecht, Sophia; Greenhouse, Joel; MacWhinney, Brian; Schnur, Tatiana T. – Journal of Speech, Language, and Hearing Research, 2021

Purpose: Analysis of connected speech in the field of adult neurogenic communication disorders is essential for research and clinical purposes, yet time and expertise are often cited as limiting factors. The purpose of this project was to create and evaluate an automated program to score and compute the measures from the Quantitative Production…

Descriptors: Speech, Automation, Statistical Analysis, Adults

The Mathematical Quality of Instruction (MQI) in Kindergarten: An Evaluation of the Stability of the MQI Using Generalizability Theory

Peer reviewed

Direct link

Mantzicopoulos, Panayota; French, Brian F.; Patrick, Helen – Early Education and Development, 2018

Research Findings: We evaluated the score stability of the Mathematical Quality of Instruction (MQI), an observational measure of mathematics instruction. Three raters each scored, independently, 100 video-recorded lessons taught by 20 kindergarten teachers in the spring. Using generalizability theory analyses, we decomposed the MQI's score…

Descriptors: Kindergarten, Mathematics Instruction, Educational Quality, Classroom Observation Techniques

Estimating Hazard Ratios from Published Kaplan-Meier Survival Curves: A Methods Validation Study

Peer reviewed

Direct link

Saluja, Ronak; Cheng, Sierra; delos Santos, Keemo Althea; Chan, Kelvin K. W. – Research Synthesis Methods, 2019

Objective: Various statistical methods have been developed to estimate hazard ratios (HRs) from published Kaplan-Meier (KM) curves for the purpose of performing meta-analyses. The objective of this study was to determine the reliability, accuracy, and precision of four commonly used methods by Guyot, Williamson, Parmar, and Hoyle and Henley.…

Descriptors: Meta Analysis, Reliability, Accuracy, Randomized Controlled Trials

The Test of Masticating and Swallowing Solids (TOMASS): Reliability, Validity and International Normative Data

Peer reviewed

Direct link

Huckabee, Maggie-Lee; McIntosh, Theresa; Fuller, Laura; Curry, Morgan; Thomas, Paige; Walshe, Margaret; McCague, Ellen; Battel, Irene; Nogueira, Dalia; Frank, Ulrike; van den Engel-Hoek, Lenie; Sella-Weiss, Oshrat – International Journal of Language & Communication Disorders, 2018

Background: Clinical swallowing assessment is largely limited to qualitative assessment of behavioural observations. There are limited quantitative data that can be compared with a healthy population for identification of impairment. The Test of Masticating and Swallowing Solids (TOMASS) was developed as a quantitative assessment of solid bolus…

Descriptors: Medical Evaluation, Clinical Diagnosis, Motor Reactions, Reliability

Student Evaluation of Teaching and Matters of Reliability

Peer reviewed

Direct link

Clayson, Dennis E. – Assessment & Evaluation in Higher Education, 2018

The student evaluation of teaching process is generally thought to produce reliable results. The consistency is found within class and instructor averages, while a considerable amount of inconsistency exists with individual student responses. This paper reviews these issues along with a detailed examination of common measures of reliability that…

Descriptors: Student Evaluation of Teacher Performance, Reliability, Validity, Evaluation Criteria

A Nonparametric Procedure for Exploring Differences in Rating Quality across Test-Taker Subgroups in Rater-Mediated Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2019

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…

Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests

Kappa Coefficients for Missing Data

Peer reviewed

Direct link

De Raadt, Alexandra; Warrens, Matthijs J.; Bosker, Roel J.; Kiers, Henk A. L. – Educational and Psychological Measurement, 2019

Cohen's kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen's kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data…

Descriptors: Interrater Reliability, Data, Statistical Analysis, Statistical Bias

The Counseling Competencies Scale: Validation and Refinement

Peer reviewed

Direct link

Lambie, Glenn W.; Mullen, Patrick R.; Swank, Jacqueline M.; Blount, Ashley – Measurement and Evaluation in Counseling and Development, 2018

Supervisors evaluated counselors-in-training at multiple points during their practicum experience using the Counseling Competencies Scale (CCS; N = 1,070). The CCS evaluations were randomly split to conduct exploratory factor analysis and confirmatory factor analysis, resulting in a 2-factor model (61.5% of the variance explained).

Descriptors: Counselor Training, Counseling, Measures (Individuals), Competence

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 18

Educational and Psychological…	14
ProQuest LLC	11
Online Submission	8
Language Testing	7
English Language Teaching	6
Applied Measurement in…	5
Measurement in Physical…	5
Early Child Development and…	4
Grantee Submission	4
Journal of Positive Behavior…	4
Assessment & Evaluation in…	3
CBE - Life Sciences Education	3
Creativity Research Journal	3
Journal of Speech, Language,…	3
Language Assessment Quarterly	3
Action in Teacher Education	2
Advances in Health Sciences…	2
Advances in Language and…	2
American Journal of Distance…	2
Assessment for Effective…	2
Autism: The International…	2
British Journal of Guidance &…	2
Computers in the Schools	2
ETS Research Report Series	2
Early Education and…	2
More ▼

Coniam, David	3
Wind, Stefanie A.	3
Bahreini, Kiavash	2
Beach, Kristen D.	2
Bocian, Kathleen M.	2
Bodur, Yasar	2
Buitelaar, Jan K.	2
Cousineau, Denis	2
Iramaneerat, Cherdsak	2
Kilgus, Stephen P.	2
Kratz, Hilary E.	2
Laurencelle, Louis	2
Li, Weidong	2
Locke, Jill	2
Mandell, David S.	2
McWilliam, R. A.	2
Nadolski, Rob	2
O'Connor, Rollanda E.	2
Ouellette, Rachel R.	2
Piotrowski, Zinnia	2
Prevost, Luanna B.	2
Rommelse, Nanda N. J.	2
Sadler, Philip M.	2
Stahmer, Aubyn C.	2
More ▼

Test of English as a Foreign…	5
Woodcock Johnson Tests of…	4
Dynamic Indicators of Basic…	3
Strengths and Difficulties…	3
Advanced Placement…	2
Autism Diagnostic Observation…	2
Child Behavior Checklist	2
Early Childhood Environment…	2
Flesch Kincaid Grade Level…	2
Gray Oral Reading Test	2
Maslach Burnout Inventory	2
Woodcock Reading Mastery Test	2
ACT Assessment	1
Behavior Assessment System…	1
Bracken Basic Concept Scale	1
Draw a Person Test	1
Kaufman Assessment Battery…	1
MacArthur Communicative…	1
Mullen Scales of Early…	1
Multifactor Leadership…	1
Obsessive Compulsive Scale	1
Reading Miscue Inventory	1
SAT (College Admission Test)	1
Student Teacher Relationship…	1
Study Process Questionnaire	1
More ▼