ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	6

Descriptor

Evaluators	11
Measurement Techniques	11
Scores	11
Educational Change	3
Elementary Secondary Education	3
Evaluation Methods	3
Interrater Reliability	3
Rating Scales	3
Reliability	3
Scoring	3
Scoring Rubrics	3
Summative Evaluation	3
Teacher Evaluation	3
Test Reliability	3
Training	3
Academic Achievement	2
Administrator Attitudes	2
Administrator Qualifications	2
Attitude Measures	2
Documentation	2
Educational Assessment	2
Educational Environment	2
Educational Objectives	2
Elementary School Teachers	2
Employment Level	2
More ▼

Source

Regional Educational…	2
AERA Online Paper Repository	1
Journal of Speech, Language,…	1
Measurement:…	1
Physical Educator	1

Publication Type

Reports - Research	6
Reports - Evaluative	4
Journal Articles	3
Speeches/Meeting Papers	3
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Elementary Education	3
Middle Schools	3
High Schools	2
Junior High Schools	2
Secondary Education	2
Early Childhood Education	1
Grade 2	1
Grade 5	1
Intermediate Grades	1
Primary Education	1

Audience

Policymakers	1
Practitioners	1
Researchers	1

Location

New Hampshire	2
New Zealand	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Visualizing Agreement: Bland-Altman Plots as a Supplement to Inter-Rater Reliability Indices

Peer reviewed

Direct link

Brogan L. Barr; Virginia V. W. McIntosh; Eileen F. Britt; Jennifer Jordan; Janet D. Carter – Measurement: Interdisciplinary Research and Perspectives, 2024

Even when raters demonstrate agreement in the use of a measure, limited score variability or violation of often-ignored statistical assumptions can result in lower reliability estimates than intuitively expected. This article uses data drawn from two randomized controlled trials of schema therapy and cognitive behavioral therapy for the treatment…

Descriptors: Evaluators, Interrater Reliability, Reliability, Measurement Techniques

Investigating Human Essay Rating Quality in a Large-Scale Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Zhang, Xiuyuan – AERA Online Paper Repository, 2019

The main purpose of the study is to evaluate the qualities of human essay ratings for a large-scale assessment using Rasch measurement theory. Specifically, Many-Facet Rasch Measurement (MFRM) was utilized to examine the rating scale category structure and provide important information about interpretations of ratings in the large-scale…

Descriptors: Essays, Evaluators, Writing Evaluation, Reliability

Reliability and Construct Validity of the TBI-QOL Communication Short Form as a Parent-Proxy Report Instrument for Children with Traumatic Brain Injury

Peer reviewed

Direct link

Cohen, Matthew L.; Tulsky, David S.; Boulton, Aaron J.; Kisala, Pamela A.; Bertisch, Hilary; Yeates, Keith Owen; Zonfrillo, Mark R.; Durbin, Dennis R.; Jaffe, Kenneth M.; Temkin, Nancy; Wang, Jin; Rivara, Frederick P. – Journal of Speech, Language, and Hearing Research, 2019

Purpose: The purpose of this study was to evaluate the internal consistency and construct validity of the Traumatic Brain Injury Quality of Life Communication Item Bank (TBI-QOL COM) short form as a parent-proxy report measure. The TBI-QOL COM is a patient-reported outcome measure of functional communication originally developed as a self-report…

Descriptors: Brain, Head Injuries, Quality of Life, Pediatrics

Addressing Educational Reform: Exploring PE Metrics as a System to Measure Student Achievement in Physical Education

Peer reviewed

Direct link

Hushman, Glenn; Hushman, Carolyn; Carbonneau, Kira – Physical Educator, 2015

The current educational reform movement in the United States is focused on measuring the effectiveness of teachers. One component of teacher effectiveness is student achievement. The effectiveness of using PE Metrics as a measure of student achievement in a physical activity setting with a low socioeconomic, culturally diverse population was…

Descriptors: Educational Change, Physical Education, Teacher Effectiveness, Physical Activities

Redesigning Teacher Evaluations: Lessons from a Pilot Implementation. Stated Briefly. REL 2016-101

Peer reviewed
PDF on ERIC

Download full text

Riordan, Julie; Shakman, Karen; Chang, Quincy; Lacireno-Paquet, Natalie; Bocala, Candice – Regional Educational Laboratory Northeast & Islands, 2015

This "Stated Briefly" report is a companion piece that summarizes the results of another report of the same name. REL Northeast and Islands, in collaboration with the Northeast Educator Effectiveness Research Alliance and the New Hampshire Department of Education conducted a study of the implementation of new teacher evaluation systems…

Descriptors: Teacher Evaluation, Evaluation Methods, Standards, School Districts

Redesigning Teacher Evaluation: Lessons from a Pilot Implementation. REL 2015-030

Peer reviewed
PDF on ERIC

Download full text

Riordan, Julie; Lacireno-Paquet, Natalie; Shakman, Karen; Bocala, Candice; Chang, Quincy – Regional Educational Laboratory Northeast & Islands, 2015

REL Northeast and Islands, in collaboration with the Northeast Educator Effectiveness Research Alliance and the New Hampshire Department of Education, conducted a study of the implementation of new teacher evaluation systems in New Hampshire's School Improvement Grant (SIG) schools. While the basic system features are similar across district…

Descriptors: Teacher Evaluation, Evaluation Methods, Standards, School Districts

Generalizability Theory and Many-Facet Rasch Measurement.

Download full text

Linacre, John M. – 1993

Generalizability theory (G-theory) and many-facet Rasch measurement (Rasch) manage the variability inherent when raters rate examinees on test items. The purpose of G-theory is to estimate test reliability in a raw score metric. Unadjusted examinee raw scores are reported as measures. A variance component is estimated for the examinee…

Descriptors: Comparative Analysis, Equations (Mathematics), Estimation (Mathematics), Evaluators

Reliability of Professionally Scored Data: NAEP-Related Issues.

Kaplan, Bruce A.; Johnson, Eugene G. – 1992

Across the field of educational assessment the case has been made for alternatives to the multiple-choice item type. Most of the alternative types of items require a subjective evaluation by a rater. The reliability of this subjective rating is a key component of these types of alternative items. In this paper, measures of reliability are…

Descriptors: Educational Assessment, Elementary Secondary Education, Estimation (Mathematics), Evaluators

Educational Achievement Standards: NAGB's Approach Yields Misleading Interpretations. Report to Congressional Requesters.

Download full text

General Accounting Office, Washington, DC. Program Evaluation and Methodology Div. – 1993

In September 1991, the National Assessment Governing Board (NAGB) announced standards for basic, proficient, and advanced achievement in mathematics and reported that few American students had reached these standards. Expert reviewers noted technical problems with the NAGB approach and questioned its results. In this report, the NAGB…

Descriptors: Academic Achievement, Academic Standards, Educational Policy, Elementary Secondary Education

Teacher Evaluation and Assessment Center. Report for 1985-1986.

University of South Florida, Tampa. – 1986

The Teacher Evaluation and Assessment Center (TEAC) was established by the Department of Education at the University of South Florida in 1984 to serve the state in the certification of trainers and observers of the Florida Performance Measurement System (FPMS) and to score and report performance evaluations for special programs. This report…

Descriptors: Beginning Teachers, Certification, Classroom Observation Techniques, Elementary Secondary Education

Sampling Variability of Performance Assessments. Report on the Status of Generalizability Performance: Generalizability and Transfer of Performance Assessments. Project 2.4: Design Theory and Psychometrics for Complex Performance Assessment in Science.

Download full text

Shavelson, Richard J.; And Others – 1993

In this paper, performance assessments are cast within a sampling framework. A performance assessment score is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, the authors present evidence bearing on the…

Descriptors: Academic Achievement, Educational Assessment, Error of Measurement, Evaluators

Bocala, Candice	2
Chang, Quincy	2
Lacireno-Paquet, Natalie	2
Riordan, Julie	2
Shakman, Karen	2
Bertisch, Hilary	1
Boulton, Aaron J.	1
Brogan L. Barr	1
Carbonneau, Kira	1
Cohen, Matthew L.	1
Durbin, Dennis R.	1
Eileen F. Britt	1
Hushman, Carolyn	1
Hushman, Glenn	1
Jaffe, Kenneth M.	1
Janet D. Carter	1
Jennifer Jordan	1
Johnson, Eugene G.	1
Kaplan, Bruce A.	1
Kisala, Pamela A.	1
Linacre, John M.	1
Rivara, Frederick P.	1
Shavelson, Richard J.	1
Temkin, Nancy	1
More ▼