ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	6

Descriptor

Error of Measurement	8
Evaluators	8
Item Response Theory	8
Comparative Analysis	4
Accuracy	3
Models	3
Evaluation Methods	2
Item Analysis	2
Scores	2
Scoring	2
Accountability	1
Bias	1
Classroom Observation…	1
Computer Assisted Instruction	1
Computer Assisted Testing	1
Correlation	1
Cutting Scores	1
Decision Making	1
Difficulty Level	1
Educational Assessment	1
English (Second Language)	1
Equated Scores	1
Essay Tests	1
Essays	1
Feedback (Response)	1
More ▼

Source

Journal of Educational…	2
Applied Measurement in…	1
CALICO Journal	1
Educational Assessment	1
Journal of Educational and…	1
Psychological Methods	1

Publication Type

Journal Articles	7
Reports - Research	5
Reports - Evaluative	2
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Researchers

Location

China

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 8 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

Improving the Precision of Classroom Observation Scores Using a Multi-Rater and Multi-Timepoint Item Response Theory Model

Peer reviewed

Direct link

Kelly Edwards; James Soland – Educational Assessment, 2024

Classroom observational protocols, in which raters observe and score the quality of teachers' instructional practices, are often used to evaluate teachers for consequential purposes despite evidence that scores from such protocols are frequently driven by factors, such as rater and temporal effects, that have little to do with teacher quality. In…

Descriptors: Classroom Observation Techniques, Teacher Evaluation, Accuracy, Scores

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Examining the Precision of Cut Scores within a Generalizability Theory Framework: A Closer Look at the Item Effect

Peer reviewed

Direct link

Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020

An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…

Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of "WriteToLearn"

Peer reviewed
PDF on ERIC

Download full text

Liu, Sha; Kunnan, Antony John – CALICO Journal, 2016

This study investigated the application of "WriteToLearn" on Chinese undergraduate English majors' essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Testing Intergroup Concordance in Ranking Experiments with Two Groups of Judges

Peer reviewed

Direct link

Dekle, Dawn J.; Leung, Denis H. Y.; Zhu, Min – Psychological Methods, 2008

Across many areas of psychology, concordance is commonly used to measure the (intragroup) agreement in ranking a number of items by a group of judges. Sometimes, however, the judges come from multiple groups, and in those situations, the interest is to measure the concordance between groups, under the assumption that there is some within-group…

Descriptors: Item Response Theory, Statistical Analysis, Psychological Studies, Evaluators

The Rater Bundle Model.

Peer reviewed

Wilson, Mark; Hoskens, Machteld – Journal of Educational and Behavioral Statistics, 2001

Introduces the Rater Bundle Model, an item response model for repeated ratings of student work. Applies the model to real and simulated data to illustrate the approach, which was motivated by the observation that when repeated ratings occur, the assumption of conditional independence is violated, and current item response models can then…

Descriptors: Error of Measurement, Evaluators, Item Response Theory, Models

Rank Ordering or Judge-Awarded Ratings?

Download full text

Linacre, John M. – 1990

Rank ordering examinees is an easier task for judges than is awarding numerical ratings. A measurement model for rankings based on Rasch's objectivity axioms provides linear, sample-independent and judge-independent measures. Estimates of examinee measures are obtained from the data set of rankings, along with standard errors and fit statistics.…

Descriptors: Comparative Analysis, Error of Measurement, Essay Tests, Evaluators

Carl Westine	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Dekle, Dawn J.	1
Hoskens, Machteld	1
James Soland	1
Kane, Michael	1
Kelly Edwards	1
Kunnan, Antony John	1
Lee, Won-Chan	1
Leung, Denis H. Y.	1
Linacre, John M.	1
Liu, Sha	1
Michelle Boyer	1
Song, Yoon Ah	1
Stella Y. Kim	1
Tong Wu	1
Wilson, Mark	1
Zhu, Min	1
More ▼