ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	7

Descriptor

Comparative Analysis	8
Evaluators	8
Goodness of Fit	8
Scoring	4
Statistical Analysis	3
Classification	2
Correlation	2
Evaluation Methods	2
Interrater Reliability	2
Mathematical Models	2
Multiple Choice Tests	2
Performance Based Assessment	2
Science Tests	2
Simulation	2
Academic Achievement	1
Bias	1
Bilingualism	1
Biology	1
Content Area Writing	1
Cues	1
Data Analysis	1
Decision Making	1
Difficulty Level	1
Educational Research	1
English (Second Language)	1
More ▼

Source

American Journal of Evaluation	1
Applied Measurement in…	1
Educational Measurement:…	1
Educational and Psychological…	1
Language Assessment Quarterly	1
Measurement:…	1
ProQuest LLC	1

Author

Wind, Stefanie A.	3
Walker, A. Adrienne	2
Christie, Christina A.	1
Ferrara, Steve	1
Franke, Todd Michael	1
Ho, Timothy	1
Lamprianou, Iasonas	1
Linacre, John M.	1
Steedle, Jeffrey T.	1
Susan Rowe	1

Publication Type

Journal Articles	6
Reports - Research	5
Dissertations/Theses -…	1
Reports - Descriptive	1
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

High Schools	1
Secondary Education	1

Audience

Location

Massachusetts

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 8 results Save | Export

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Exploring the Impacts of Different Score Resolution Procedures on Person Fit and Estimated Achievement in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Language Assessment Quarterly, 2020

Scoring procedures for many rater-mediated performance assessments include score resolution procedures in which a third rater adjudicates discrepancies between two raters' ratings of the same performance. There are numerous approaches for calculating resolved scores that involve different combinations of the original and third ratings. Using data…

Descriptors: Scoring, Evaluators, Goodness of Fit, Content Area Writing

Examining the Effects of Linguistic Complexity on Emergent Bilinguals' Academic Content Performance

Direct link

Susan Rowe – ProQuest LLC, 2023

This dissertation explored whether unnecessary linguistic complexity (LC) in mathematics and biology assessment items changes the direction and significance of differential item functioning (DIF) between subgroups emergent bilinguals (EBs) and English proficient students (EPs). Due to inconsistencies in measuring LC in items, Study One adapted a…

Descriptors: Difficulty Level, English for Academic Purposes, Second Language Learning, Second Language Instruction

Investigation of Rater Effects Using Social Network Analysis and Exponential Random Graph Models

Peer reviewed

Direct link

Lamprianou, Iasonas – Educational and Psychological Measurement, 2018

It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…

Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

The Chi-Square Test: Often Used and More Often Misinterpreted

Peer reviewed

Direct link

Franke, Todd Michael; Ho, Timothy; Christie, Christina A. – American Journal of Evaluation, 2012

The examination of cross-classified category data is common in evaluation and research, with Karl Pearson's family of chi-square tests representing one of the most utilized statistical analyses for answering questions about the association or difference between categorical variables. Unfortunately, these tests are also among the more commonly…

Descriptors: Evaluators, Statistical Analysis, Research Methodology, Evaluation Research

Rank Ordering or Judge-Awarded Ratings?

Download full text

Linacre, John M. – 1990

Rank ordering examinees is an easier task for judges than is awarding numerical ratings. A measurement model for rankings based on Rasch's objectivity axioms provides linear, sample-independent and judge-independent measures. Estimates of examinee measures are obtained from the data set of rankings, along with standard errors and fit statistics.…

Descriptors: Comparative Analysis, Error of Measurement, Essay Tests, Evaluators