Publication Date
In 2025 | 1 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 6 |
Descriptor
Error of Measurement | 8 |
Evaluators | 8 |
Item Response Theory | 8 |
Comparative Analysis | 4 |
Accuracy | 3 |
Models | 3 |
Evaluation Methods | 2 |
Item Analysis | 2 |
Scores | 2 |
Scoring | 2 |
Accountability | 1 |
More ▼ |
Source
Journal of Educational… | 2 |
Applied Measurement in… | 1 |
CALICO Journal | 1 |
Educational Assessment | 1 |
Journal of Educational and… | 1 |
Psychological Methods | 1 |
Author
Carl Westine | 1 |
Clauser, Brian E. | 1 |
Clauser, Jerome C. | 1 |
Dekle, Dawn J. | 1 |
Hoskens, Machteld | 1 |
James Soland | 1 |
Kane, Michael | 1 |
Kelly Edwards | 1 |
Kunnan, Antony John | 1 |
Lee, Won-Chan | 1 |
Leung, Denis H. Y. | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 5 |
Reports - Evaluative | 2 |
Reports - Descriptive | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Location
China | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
Kelly Edwards; James Soland – Educational Assessment, 2024
Classroom observational protocols, in which raters observe and score the quality of teachers' instructional practices, are often used to evaluate teachers for consequential purposes despite evidence that scores from such protocols are frequently driven by factors, such as rater and temporal effects, that have little to do with teacher quality. In…
Descriptors: Classroom Observation Techniques, Teacher Evaluation, Accuracy, Scores
Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022
This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…
Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy
Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…
Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting
Liu, Sha; Kunnan, Antony John – CALICO Journal, 2016
This study investigated the application of "WriteToLearn" on Chinese undergraduate English majors' essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was…
Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning
Dekle, Dawn J.; Leung, Denis H. Y.; Zhu, Min – Psychological Methods, 2008
Across many areas of psychology, concordance is commonly used to measure the (intragroup) agreement in ranking a number of items by a group of judges. Sometimes, however, the judges come from multiple groups, and in those situations, the interest is to measure the concordance between groups, under the assumption that there is some within-group…
Descriptors: Item Response Theory, Statistical Analysis, Psychological Studies, Evaluators

Wilson, Mark; Hoskens, Machteld – Journal of Educational and Behavioral Statistics, 2001
Introduces the Rater Bundle Model, an item response model for repeated ratings of student work. Applies the model to real and simulated data to illustrate the approach, which was motivated by the observation that when repeated ratings occur, the assumption of conditional independence is violated, and current item response models can then…
Descriptors: Error of Measurement, Evaluators, Item Response Theory, Models
Linacre, John M. – 1990
Rank ordering examinees is an easier task for judges than is awarding numerical ratings. A measurement model for rankings based on Rasch's objectivity axioms provides linear, sample-independent and judge-independent measures. Estimates of examinee measures are obtained from the data set of rankings, along with standard errors and fit statistics.…
Descriptors: Comparative Analysis, Error of Measurement, Essay Tests, Evaluators