Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 7 |
Descriptor
Comparative Analysis | 8 |
Evaluators | 8 |
Goodness of Fit | 8 |
Scoring | 4 |
Statistical Analysis | 3 |
Classification | 2 |
Correlation | 2 |
Evaluation Methods | 2 |
Interrater Reliability | 2 |
Mathematical Models | 2 |
Multiple Choice Tests | 2 |
More ▼ |
Source
American Journal of Evaluation | 1 |
Applied Measurement in… | 1 |
Educational Measurement:… | 1 |
Educational and Psychological… | 1 |
Language Assessment Quarterly | 1 |
Measurement:… | 1 |
ProQuest LLC | 1 |
Author
Wind, Stefanie A. | 3 |
Walker, A. Adrienne | 2 |
Christie, Christina A. | 1 |
Ferrara, Steve | 1 |
Franke, Todd Michael | 1 |
Ho, Timothy | 1 |
Lamprianou, Iasonas | 1 |
Linacre, John M. | 1 |
Steedle, Jeffrey T. | 1 |
Susan Rowe | 1 |
Publication Type
Journal Articles | 6 |
Reports - Research | 5 |
Dissertations/Theses -… | 1 |
Reports - Descriptive | 1 |
Reports - Evaluative | 1 |
Speeches/Meeting Papers | 1 |
Education Level
High Schools | 1 |
Secondary Education | 1 |
Audience
Location
Massachusetts | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021
Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…
Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making
Wind, Stefanie A.; Walker, A. Adrienne – Language Assessment Quarterly, 2020
Scoring procedures for many rater-mediated performance assessments include score resolution procedures in which a third rater adjudicates discrepancies between two raters' ratings of the same performance. There are numerous approaches for calculating resolved scores that involve different combinations of the original and third ratings. Using data…
Descriptors: Scoring, Evaluators, Goodness of Fit, Content Area Writing
Susan Rowe – ProQuest LLC, 2023
This dissertation explored whether unnecessary linguistic complexity (LC) in mathematics and biology assessment items changes the direction and significance of differential item functioning (DIF) between subgroups emergent bilinguals (EBs) and English proficient students (EPs). Due to inconsistencies in measuring LC in items, Study One adapted a…
Descriptors: Difficulty Level, English for Academic Purposes, Second Language Learning, Second Language Instruction
Lamprianou, Iasonas – Educational and Psychological Measurement, 2018
It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…
Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Franke, Todd Michael; Ho, Timothy; Christie, Christina A. – American Journal of Evaluation, 2012
The examination of cross-classified category data is common in evaluation and research, with Karl Pearson's family of chi-square tests representing one of the most utilized statistical analyses for answering questions about the association or difference between categorical variables. Unfortunately, these tests are also among the more commonly…
Descriptors: Evaluators, Statistical Analysis, Research Methodology, Evaluation Research
Linacre, John M. – 1990
Rank ordering examinees is an easier task for judges than is awarding numerical ratings. A measurement model for rankings based on Rasch's objectivity axioms provides linear, sample-independent and judge-independent measures. Estimates of examinee measures are obtained from the data set of rankings, along with standard errors and fit statistics.…
Descriptors: Comparative Analysis, Error of Measurement, Essay Tests, Evaluators