Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 11 |
Descriptor
Evaluation Methods | 11 |
Evaluators | 5 |
Models | 5 |
Goodness of Fit | 3 |
Interrater Reliability | 3 |
Observation | 3 |
Simulation | 3 |
Writing Evaluation | 3 |
Classroom Environment | 2 |
Educational Assessment | 2 |
Measurement | 2 |
More ▼ |
Source
Educational Measurement:… | 3 |
Journal of Educational… | 3 |
College Board | 1 |
Educational Assessment | 1 |
Language Testing | 1 |
Measurement:… | 1 |
Teacher Educator | 1 |
Author
Wind, Stefanie A. | 11 |
Jones, Eli | 3 |
Burcham, Jan | 1 |
Chajewski, Michael | 1 |
Dailey, Thomas | 1 |
Engelhard, George, Jr. | 1 |
Ge, Yuan | 1 |
Guo, Wenjing | 1 |
Hart, Anna | 1 |
Kobrin, Jennifer L. | 1 |
Peterson, Meghan E. | 1 |
More ▼ |
Publication Type
Journal Articles | 10 |
Reports - Research | 9 |
Reports - Descriptive | 2 |
Information Analyses | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Audience
Practitioners | 1 |
Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Wind, Stefanie A.; Ge, Yuan – Measurement: Interdisciplinary Research and Perspectives, 2023
In selected-response assessments such as attitude surveys with Likert-type rating scales, examinees often select from rating scale categories to reflect their locations on a construct. Researchers have observed that some examinees exhibit "response styles," which are systematic patterns of responses in which examinees are more likely to…
Descriptors: Goodness of Fit, Responses, Likert Scales, Models
Jones, Eli; Wind, Stefanie A.; Burcham, Jan; Hart, Anna; Dailey, Thomas – Teacher Educator, 2023
While much research has explored the quality of traditional teacher evaluations, little is known about the quality of ratings in preservice teacher evaluations. This paper presents a case study of Many-facet Rasch measurement (MFR, Linacre, 1989) to explore potential rater effects influencing the quality of supervisor ratings in educator…
Descriptors: Student Teachers, Student Evaluation, Student Teacher Supervisors, Observation
Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2020
Researchers have documented the impact of rater effects, or raters' tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers' achievement estimates given their response patterns,…
Descriptors: Performance Based Assessment, Evaluators, Achievement, Influences
Wind, Stefanie A.; Guo, Wenjing – Educational Assessment, 2021
Scoring procedures for the constructed-response (CR) items in large-scale mixed-format educational assessments often involve checks for rater agreement or rater reliability. Although these analyses are important, researchers have documented rater effects that persist despite rater training and that are not always detected in rater agreement and…
Descriptors: Scoring, Responses, Test Items, Test Format
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019
Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…
Descriptors: Rating Scales, Models, Evaluators, Data Collection
Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2018
Range restrictions, or raters' tendency to limit their ratings to a subset of available rating scale categories, are well documented in large-scale teacher evaluation systems based on principal observations. When these restrictions occur, the ratings observed during operational teacher evaluations are limited to a subset of the available…
Descriptors: Measurement, Classroom Environment, Observation, Rating Scales
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2017
Mokken scale analysis (MSA) is a probabilistic-nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic-nonparametric framework in which to explore…
Descriptors: Probability, Nonparametric Statistics, Item Response Theory, Scaling
Engelhard, George, Jr.; Wind, Stefanie A.; Kobrin, Jennifer L.; Chajewski, Michael – College Board, 2013
The purpose of this study is to illustrate the use of explanatory models based on Rasch measurement theory to detect systematic relationships between student and item characteristics and achievement differences using differential item functioning (DIF), differential group functioning (DGF), and differential person functioning (DPF) techniques. The…
Descriptors: Test Bias, Evaluation Methods, Measurement Techniques, Writing Evaluation