ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	11

Source

Educational Measurement:…	3
Journal of Educational…	3
College Board	1
Educational Assessment	1
Language Testing	1
Measurement:…	1
Teacher Educator	1

Author

Wind, Stefanie A.	11
Jones, Eli	3
Burcham, Jan	1
Chajewski, Michael	1
Dailey, Thomas	1
Engelhard, George, Jr.	1
Ge, Yuan	1
Guo, Wenjing	1
Hart, Anna	1
Kobrin, Jennifer L.	1
Peterson, Meghan E.	1
Walker, A. Adrienne	1
Wesolowski, Brian C.	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	9
Reports - Descriptive	2
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Practitioners	1
Researchers	1

Location

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Identifying Response Styles Using Person Fit Analysis and Response-Styles Models

Peer reviewed

Direct link

Wind, Stefanie A.; Ge, Yuan – Measurement: Interdisciplinary Research and Perspectives, 2023

In selected-response assessments such as attitude surveys with Likert-type rating scales, examinees often select from rating scale categories to reflect their locations on a construct. Researchers have observed that some examinees exhibit "response styles," which are systematic patterns of responses in which examinees are more likely to…

Descriptors: Goodness of Fit, Responses, Likert Scales, Models

A Case Study of a Multi-Faceted Approach to Evaluating Teacher Candidate Ratings

Peer reviewed

Direct link

Jones, Eli; Wind, Stefanie A.; Burcham, Jan; Hart, Anna; Dailey, Thomas – Teacher Educator, 2023

While much research has explored the quality of traditional teacher evaluations, little is known about the quality of ratings in preservice teacher evaluations. This paper presents a case study of Many-facet Rasch measurement (MFR, Linacre, 1989) to explore potential rater effects influencing the quality of supervisor ratings in educator…

Descriptors: Student Teachers, Student Evaluation, Student Teacher Supervisors, Observation

Exploring the Impact of Rater Effects on Person Fit in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2020

Researchers have documented the impact of rater effects, or raters' tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers' achievement estimates given their response patterns,…

Descriptors: Performance Based Assessment, Evaluators, Achievement, Influences

Beyond Agreement: Exploring Rater Effects in Large-Scale Mixed Format Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Guo, Wenjing – Educational Assessment, 2021

Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and…

Descriptors: Scoring, Responses, Test Items, Test Format

Pedagogical Considerations for Examining Rater Variability in Rater-Mediated Assessments: A Three-Model Framework

Peer reviewed

Direct link

Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…

Descriptors: Interrater Reliability, Models, Observation, Measurement

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

The Effects of Incomplete Rating Designs in Combination with Rater Effects

Peer reviewed

Direct link

Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019

Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…

Descriptors: Rating Scales, Models, Evaluators, Data Collection

Exploring the Influence of Range Restrictions on Connectivity in Sparse Assessment Networks: An Illustration and Exploration within the Context of Classroom Observations

Peer reviewed

Direct link

Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2018

Range restrictions, or raters' tendency to limit their ratings to a subset of available rating scale categories, are well documented in large-scale teacher evaluation systems based on principal observations. When these restrictions occur, the ratings observed during operational teacher evaluations are limited to a subset of the available…

Descriptors: Measurement, Classroom Environment, Observation, Rating Scales

A Systematic Review of Methods for Evaluating Rating Quality in Language Assessment

Peer reviewed

Direct link

Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018

The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…

Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability

An Instructional Module on Mokken Scale Analysis

Peer reviewed

Direct link

Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2017

Mokken scale analysis (MSA) is a probabilistic-nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic-nonparametric framework in which to explore…

Descriptors: Probability, Nonparametric Statistics, Item Response Theory, Scaling

Differential Item and Person Functioning in Large-Scale Writing Assessments within the Context of the SAT®. Research Report 2013-6

Download full text

Engelhard, George, Jr.; Wind, Stefanie A.; Kobrin, Jennifer L.; Chajewski, Michael – College Board, 2013

The purpose of this study is to illustrate the use of explanatory models based on Rasch measurement theory to detect systematic relationships between student and item characteristics and achievement differences using differential item functioning (DIF), differential group functioning (DGF), and differential person functioning (DPF) techniques. The…

Descriptors: Test Bias, Evaluation Methods, Measurement Techniques, Writing Evaluation

Evaluation Methods	11
Evaluators	5
Models	5
Goodness of Fit	3
Interrater Reliability	3
Observation	3
Simulation	3
Writing Evaluation	3
Classroom Environment	2
Educational Assessment	2
Measurement	2
Multiple Choice Tests	2
Performance Based Assessment	2
Rating Scales	2
Responses	2
Achievement	1
College Entrance Examinations	1
Comparative Analysis	1
Computation	1
Data Collection	1
Decision Making	1
Difficulty Level	1
Educational Research	1
Error Patterns	1
Ethnic Groups	1
More ▼