Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 11 |
Descriptor
Evaluators | 25 |
Scoring | 9 |
Interrater Reliability | 8 |
Comparative Analysis | 6 |
Higher Education | 5 |
Mathematical Models | 5 |
Rating Scales | 5 |
Undergraduate Students | 5 |
Classification | 4 |
Correlation | 4 |
Decision Making | 4 |
More ▼ |
Source
Educational and Psychological… | 25 |
Author
Publication Type
Journal Articles | 24 |
Reports - Research | 16 |
Reports - Evaluative | 8 |
Speeches/Meeting Papers | 1 |
Education Level
Elementary Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
National Teacher Examinations | 1 |
Trends in International… | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Wind, Stefanie A.; Ge, Yuan – Educational and Psychological Measurement, 2021
Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for…
Descriptors: Evaluators, Scoring, Data Collection, Design
Wang, Jue; Engelhard, George, Jr. – Educational and Psychological Measurement, 2019
The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and…
Descriptors: Evaluative Thinking, Preferences, Evaluators, Models
Wind, Stefanie A.; Guo, Wenjing – Educational and Psychological Measurement, 2019
Rater effects, or raters' tendencies to assign ratings to performances that are different from the ratings that the performances warranted, are well documented in rater-mediated assessments across a variety of disciplines. In many real-data studies of rater effects, researchers have reported that raters exhibit more than one effect, such as a…
Descriptors: Evaluators, Bias, Scoring, Data Collection
LaVoie, Noelle; Parker, James; Legree, Peter J.; Ardison, Sharon; Kilcullen, Robert N. – Educational and Psychological Measurement, 2020
Automated scoring based on Latent Semantic Analysis (LSA) has been successfully used to score essays and constrained short answer responses. Scoring tests that capture open-ended, short answer responses poses some challenges for machine learning approaches. We used LSA techniques to score short answer responses to the Consequences Test, a measure…
Descriptors: Semantics, Evaluators, Essays, Scoring
von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023
Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…
Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Lamprianou, Iasonas – Educational and Psychological Measurement, 2018
It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…
Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation
Wang, Jue; Engelhard, George, Jr.; Wolfe, Edward W. – Educational and Psychological Measurement, 2016
The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy…
Descriptors: Evaluators, Accuracy, Performance Based Assessment, Models
Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017
The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…
Descriptors: Scores, Item Analysis, Classification, Decision Making
Christ, Theodore J.; Riley-Tillman, T. Chris; Chafouleas, Sandra M.; Boice, Christina H. – Educational and Psychological Measurement, 2010
Generalizability theory was used to examine the generalizability and dependability of outcomes from two single-item Direct Behavior Rating (DBR) scales: DBR of actively manipulating and DBR of visually distracted. DBR is a behavioral assessment tool with specific instrumentation and procedures that can be used by a variety of service delivery…
Descriptors: Generalizability Theory, Student Behavior, Data Collection, Student Evaluation
Chafouleas, Sandra M.; Christ, Theodore J.; Riley-Tillman, T. Chris – Educational and Psychological Measurement, 2009
Generalizability theory is used to examine the impact of scaling gradients on a single-item Direct Behavior Rating (DBR). A DBR refers to a type of rating scale used to efficiently record target behavior(s) following an observation occasion. Variance components associated with scale gradients are estimated using a random effects design for persons…
Descriptors: Generalizability Theory, Undergraduate Students, Scaling, Rating Scales

Umesh, U. N.; And Others – Educational and Psychological Measurement, 1989
An approach is provided for calculating maximum values of the Kappa statistic of J. Cohen (1960) as a function of observed agreement proportions between evaluators. Separate calculations are required for different matrix sizes and observed agreement levels. (SLD)
Descriptors: Equations (Mathematics), Evaluators, Heuristics, Interrater Reliability

Ross, Donald C. – Educational and Psychological Measurement, 1992
Large sample chi-square tests of the significance of the difference between two correlated kappas, weighted or unweighted, are derived. Cases are presented with one judge in common between the two kappas and no judge in common. An illustrative calculation is included. (Author/SLD)
Descriptors: Chi Square, Correlation, Equations (Mathematics), Evaluators

Alliger, George M.; Williams, Kevin J. – Educational and Psychological Measurement, 1992
The internal consistency of a scale and various indices of rating scale response styles (such as halo, leniency, and positive or negative response bias) are related to mean scale item intercorrelation. The consequent relationship between internal consistency and rating scale response styles is discussed. (Author/SLD)
Descriptors: Correlation, Evaluators, Interrater Reliability, Rating Scales

Stauffer, A. J. – Educational and Psychological Measurement, 1978
Agreement among independent ratings assigned by members of observation teams administering an interview schedule was studied. The appropriateness of selection requirements and the adequacy of observer training are reviewed. Implications are drawn from the objectivity of the ratings for the estimated reliability and validity of the interview…
Descriptors: Attitudes, Bias, Elementary Secondary Education, Evaluators
Previous Page | Next Page ยป
Pages: 1 | 2