ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	16
Since 2006 (last 20 years)	16

Descriptor

Evaluators	16
Goodness of Fit	7
Performance Based Assessment	7
Scoring	6
Evaluation Methods	5
Interrater Reliability	5
Item Response Theory	5
Decision Making	4
Simulation	4
Academic Achievement	3
Bias	3
Comparative Analysis	3
Data Collection	3
Models	3
Multiple Choice Tests	3
Psychometrics	3
Student Evaluation	3
Writing Evaluation	3
Accuracy	2
Computation	2
Correlation	2
Design	2
Item Analysis	2
Language Tests	2
Measurement	2
More ▼

Source

Educational Measurement:…	3
Journal of Educational…	3
Educational Assessment	2
Educational and Psychological…	2
Language Testing	2
Measurement:…	2
International Journal of…	1
Language Assessment Quarterly	1

Author

Wind, Stefanie A.	16
Engelhard, George, Jr.	2
Guo, Wenjing	2
Walker, A. Adrienne	2
Foltz, Peter	1
Ge, Yuan	1
Jones, Eli	1
Peabody, Michael R.	1
Peterson, Meghan E.	1
Rosenstein, Mark	1
Schumacker, Randall E.	1
Sebok-Syer, Stefanie S.	1
Wesolowski, Brian	1
Wolfe, Edward W.	1
More ▼

Publication Type

Journal Articles	16
Reports - Research	16
Information Analyses	1

Education Level

High Schools	1
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A.; Ge, Yuan – Educational and Psychological Measurement, 2021

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for…

Descriptors: Evaluators, Scoring, Data Collection, Design

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

Exploring the Impact of Rater Effects on Person Fit in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2020

Researchers have documented the impact of rater effects, or raters' tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers' achievement estimates given their response patterns,…

Descriptors: Performance Based Assessment, Evaluators, Achievement, Influences

Beyond Agreement: Exploring Rater Effects in Large-Scale Mixed Format Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Guo, Wenjing – Educational Assessment, 2021

Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and…

Descriptors: Scoring, Responses, Test Items, Test Format

Examining Differential Rater Functioning Using a Between-Subgroup Outfit Approach

Peer reviewed

Direct link

Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019

When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…

Descriptors: Measurement, Models, Evaluators, Simulation

Monotonicity as a Nonparametric Approach to Evaluating Rater Fit in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2020

Rater fit analyses provide insight into the degree to which rater judgments correspond to expected properties, as defined within a measurement framework. Parametric models such as the Rasch model provide a useful framework for evaluating rating quality; however, these models are not appropriate for all assessment contexts. The purpose of this…

Descriptors: Evaluators, Goodness of Fit, Simulation, Psychometrics

Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Guo, Wenjing – Educational and Psychological Measurement, 2019

Rater effects, or raters' tendencies to assign ratings to performances that are different from the ratings that the performances warranted, are well documented in rater-mediated assessments across a variety of disciplines. In many real-data studies of rater effects, researchers have reported that raters exhibit more than one effect, such as a…

Descriptors: Evaluators, Bias, Scoring, Data Collection

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Exploring the Impacts of Different Score Resolution Procedures on Person Fit and Estimated Achievement in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Language Assessment Quarterly, 2020

Scoring procedures for many rater-mediated performance assessments include score resolution procedures in which a third rater adjudicates discrepancies between two raters' ratings of the same performance. There are numerous approaches for calculating resolved scores that involve different combinations of the original and third ratings. Using data…

Descriptors: Scoring, Evaluators, Goodness of Fit, Content Area Writing

The Effects of Incomplete Rating Designs in Combination with Rater Effects

Peer reviewed

Direct link

Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019

Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…

Descriptors: Rating Scales, Models, Evaluators, Data Collection

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

Exploring the Influence of Judge Proficiency on Standard-Setting Judgments

Peer reviewed

Direct link

Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…

Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators

A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment

Peer reviewed

Direct link

Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…

Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

Detecting Measurement Disturbances in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Schumacker, Randall E. – Educational Measurement: Issues and Practice, 2017

The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start-up, plodding, boredom,…

Descriptors: Measurement, Testing Problems, Writing Tests, Performance Based Assessment

Previous Page | Next Page »

Pages: 1 | 2