ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	10

Descriptor

Evaluators	31
Interrater Reliability	31
Performance Based Assessment	31
Evaluation Methods	15
Scoring	12
Educational Assessment	10
Rating Scales	8
Scores	7
Comparative Analysis	6
Decision Making	5
Least Squares Statistics	5
Student Evaluation	5
Training	5
Error of Measurement	4
Licensing Examinations…	4
Standards	4
Teacher Evaluation	4
Generalizability Theory	3
Models	3
Simulation	3
Test Reliability	3
Ability	2
Academic Achievement	2
Accuracy	2
Acting	2
More ▼

Source

Applied Measurement in…	2
Applied Psychological…	2
Assessment Update	1
Assessment in Education:…	1
ETS Research Report Series	1
Education Sciences	1
Educational Measurement:…	1
Educational Researcher	1
Educational and Psychological…	1
Evaluation and the Health…	1
Journal of Continuing…	1
Journal of Personnel…	1
Language Testing	1
Measurement:…	1
Mid-Western Educational…	1
More ▼

Publication Type

Journal Articles	17
Reports - Research	16
Reports - Evaluative	11
Speeches/Meeting Papers	9
Reports - Descriptive	2
Tests/Questionnaires	2
ERIC Digests in Full Text	1
ERIC Publications	1
Information Analyses	1
Opinion Papers	1

Education Level

Higher Education	5
Postsecondary Education	4
Adult Education	1
Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Kindergarten	1
Primary Education	1

Audience

Location

Australia	1
California	1
Japan	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

General Educational…	1
edTPA (Teacher Performance…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 31 results Save | Export

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

Impact of Self-Construal on Rater Severity in Peer Assessments of Oral Presentations

Peer reviewed

Direct link

Tanaka, Mitsuko; Ross, Steven J. – Assessment in Education: Principles, Policy & Practice, 2023

Raters vary from each other in their severity and leniency in rating performance. This study examined the factors affecting rater severity in peer assessments of oral presentations in English as a Foreign Language (EFL), focusing on peer raters' self-construal and presentation abilities. Japanese university students enrolled in EFL classes…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Peer Evaluation

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Low Inter-Rater Reliability of a High Stakes Performance Assessment of Teacher Candidates

Peer reviewed
PDF on ERIC

Download full text

Lyness, Scott A.; Peterson, Kent; Yates, Kenneth – Education Sciences, 2021

The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen's weighted kappa, the overall IRR estimate was 0.17 (poor strength of…

Descriptors: High Stakes Tests, Performance Based Assessment, Teacher Effectiveness, Academic Language

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

A Prototype Public Speaking Skills Assessment: An Evaluation of Human-Scoring Quality. Research Report. ETS RR-15-36

Peer reviewed
PDF on ERIC

Download full text

Joe, Jilliam; Kitchen, Christopher; Chen, Lei; Feng, Gary – ETS Research Report Series, 2015

The purpose of this paper is to summarize the evaluation of human-scoring quality for an assessment of public speaking skills. Videotaped performances given by 17 speakers on 4 tasks were scored by expert and nonexpert raters who had extensive experience scoring performance-based and constructed-response assessments. The Public Speaking Competence…

Descriptors: Public Speaking, Communication Skills, Scoring, Scoring Rubrics

The Effects of Rater Training on Inter-Rater Agreement

Peer reviewed

Direct link

Pufpaff, Lisa A.; Clarke, Laura; Jones, Ruth E. – Mid-Western Educational Researcher, 2015

This paper addresses the effects of rater training on the rubric-based scoring of three preservice teacher candidate performance assessments. This project sought to evaluate the consistency of ratings assigned to student learning outcome measures being used for program accreditation and to explore the need for rater training in order to increase…

Descriptors: Evaluators, Interrater Reliability, Preservice Teachers, Scoring Rubrics

The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors of Performance Ratings

Peer reviewed

Direct link

Raymond, Mark R.; Harik, Polina; Clauser, Brian E. – Applied Psychological Measurement, 2011

Prior research indicates that the overall reliability of performance ratings can be improved by using ordinary least squares (OLS) regression to adjust for rater effects. The present investigation extends previous work by evaluating the impact of OLS adjustment on standard errors of measurement ("SEM") at specific score levels. In…

Descriptors: Performance Based Assessment, Licensing Examinations (Professions), Least Squares Statistics, Item Response Theory

Rater Training to Support High-Stakes Simulation-Based Assessments

Peer reviewed

Direct link

Feldman, Moshe; Lazzara, Elizabeth H.; Vanderbilt, Allison A.; DiazGranados, Deborah – Journal of Continuing Education in the Health Professions, 2012

Competency-based assessment and an emphasis on obtaining higher-level outcomes that reflect physicians' ability to demonstrate their skills has created a need for more advanced assessment practices. Simulation-based assessments provide medical education planners with tools to better evaluate the 6 Accreditation Council for Graduate Medical…

Descriptors: Performance Based Assessment, Physicians, Accuracy, High Stakes Tests

Development of Quality Performance Tasks at Western Governors University

Peer reviewed

Direct link

Nicastro, Gerilee; Moreton, Kyle M. – Assessment Update, 2008

Western Governors University (WGU) is an online competency-based university in which students demonstrate content competence through a series of assessments. Assessments most often are performance-based or objective assessments that are developed in accordance with specific content objectives. Objective assessments generally assess lower-level…

Descriptors: Evaluators, Performance Based Assessment, Interrater Reliability, Educational Objectives

Inter-Rater Reliability on Performance Criteria: Theoretical Issues.

Miller-Whitehead, Marie – 2001

A hypothetical case study provides examples of the inter-rater reliability issues involved in complex performance assessment, focusing on the Baldrige model. A hypothetical team of five evaluators was asked to rate a Baldrige model performance assessment along the seven defined criteria or performance dimensions that comprise the Baldrige model…

Descriptors: Case Studies, Criteria, Evaluators, Interrater Reliability

Interjudge Reliability and Decision Reproducibility.

Peer reviewed

Lunz, Mary E.; And Others – Educational and Psychological Measurement, 1994

In a study involving eight judges, analysis with the FACETS model provides evidence that judges grade differently, whether or not scores correlate well. This outcome suggests that adjustments for differences among judges should be made before student measures are estimated to produce reproducible decisions. (SLD)

Descriptors: Correlation, Decision Making, Evaluation Methods, Evaluators

Using an Extended Angoff Procedure to Set Standards on Complex Performance Assessments.

Peer reviewed

Hambleton, Ronald K.; Plake, Barbara S. – Applied Measurement in Education, 1995

Several extensions to the Angoff method of standard setting are described that can accommodate characteristics of performance-based assessment. A study involving 12 panelists supported the effectiveness of the new approach but suggested that panelists preferred an approach that was at least partially conjunctive. (SLD)

Descriptors: Educational Assessment, Evaluation Methods, Evaluators, Interrater Reliability

Do Raters Demonstrate Halo Error When Scoring a Series of Responses?

Download full text

Ridge, Kirk – 2001

This study investigated whether raters in two different training groups would demonstrate halo error when each rater scored all five responses to five different mathematics performance-based items from each student. One group of 20 raters was trained by an experienced scoring director with item-specific scoring rubrics and the opportunity to…

Descriptors: Evaluators, Feedback, Interrater Reliability, Junior High School Students

Setting Performance Standards through Two-Stage Judgmental Policy Capturing.

Peer reviewed

Jaeger, Richard M. – Applied Measurement in Education, 1995

A performance-standard setting procedure termed judgmental policy capturing (JPC) and its application are described. A study involving 12 panelists demonstrated the feasibility of the JPC method for setting performance standards for classroom teachers seeking certification from the National Board for Professional Teaching Standards. (SLD)

Descriptors: Decision Making, Educational Assessment, Evaluation Methods, Evaluators

Previous Page | Next Page »

Pages: 1 | 2 | 3

Raymond, Mark R.	3
Houston, Walter M.	2
Myford, Carol M.	2
Wind, Stefanie A.	2
Auchter, Joan Chikos	1
Brull, Harry	1
Chen, Lei	1
Clarke, Laura	1
Clauser, Brian E.	1
Delandshere, Ginette	1
DiazGranados, Deborah	1
Docking, Russell	1
Feldman, Moshe	1
Feng, Gary	1
Goldberg, Gail Lynn	1
Gross, Leon J.	1
Hambleton, Ronald K.	1
Harik, Polina	1
Jaeger, Richard M.	1
Joe, Jilliam	1
Jones, Ruth E.	1
Kaiser, Paul D.	1
Kenyon, Dorry	1
Kitchen, Christopher	1
More ▼