NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 8 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Hawley, Leslie R.; Bovaird, James A.; Wu, ChaoRong – Applied Measurement in Education, 2017
Value-added assessment methods have been criticized by researchers and policy makers for a number of reasons. One issue includes the sensitivity of model results across different outcome measures. This study examined the utility of incorporating multivariate latent variable approaches within a traditional value-added framework. We evaluated the…
Descriptors: Value Added Models, Reliability, Multivariate Analysis, Scaling
Peer reviewed Peer reviewed
Direct linkDirect link
Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…
Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores
Peer reviewed Peer reviewed
Hambleton, Ronald K.; Plake, Barbara S. – Applied Measurement in Education, 1995
Several extensions to the Angoff method of standard setting are described that can accommodate characteristics of performance-based assessment. A study involving 12 panelists supported the effectiveness of the new approach but suggested that panelists preferred an approach that was at least partially conjunctive. (SLD)
Descriptors: Educational Assessment, Evaluation Methods, Evaluators, Interrater Reliability