Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 7 |
Descriptor
Statistical Analysis | 8 |
Interrater Reliability | 5 |
Scoring | 4 |
Correlation | 3 |
Evaluators | 3 |
Reliability | 3 |
Automation | 2 |
Comparative Analysis | 2 |
Error of Measurement | 2 |
Essay Tests | 2 |
Essays | 2 |
More ▼ |
Source
Applied Measurement in… | 8 |
Author
Ben-Simon, Anat | 1 |
Bovaird, James A. | 1 |
Boyer, Michelle | 1 |
Carol Eckerly | 1 |
Cohen, Allan | 1 |
Cohen, Yoav | 1 |
Ferrara, Steve | 1 |
Hambleton, Ronald K. | 1 |
Hawley, Leslie R. | 1 |
John R. Donoghue | 1 |
Kieftenbeld, Vincent | 1 |
More ▼ |
Publication Type
Journal Articles | 8 |
Reports - Research | 6 |
Reports - Evaluative | 2 |
Education Level
Elementary Education | 2 |
Early Childhood Education | 1 |
Elementary Secondary Education | 1 |
Grade 1 | 1 |
Grade 10 | 1 |
Grade 2 | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 7 | 1 |
Grade 8 | 1 |
High Schools | 1 |
More ▼ |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Stanford Achievement Tests | 1 |
What Works Clearinghouse Rating
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Hawley, Leslie R.; Bovaird, James A.; Wu, ChaoRong – Applied Measurement in Education, 2017
Value-added assessment methods have been criticized by researchers and policy makers for a number of reasons. One issue includes the sensitivity of model results across different outcome measures. This study examined the utility of incorporating multivariate latent variable approaches within a traditional value-added framework. We evaluated the…
Descriptors: Value Added Models, Reliability, Multivariate Analysis, Scaling
Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…
Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores

Hambleton, Ronald K.; Plake, Barbara S. – Applied Measurement in Education, 1995
Several extensions to the Angoff method of standard setting are described that can accommodate characteristics of performance-based assessment. A study involving 12 panelists supported the effectiveness of the new approach but suggested that panelists preferred an approach that was at least partially conjunctive. (SLD)
Descriptors: Educational Assessment, Evaluation Methods, Evaluators, Interrater Reliability