ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	11

Source

Educational and Psychological…

Publication Type

Journal Articles	24
Reports - Research	16
Reports - Evaluative	8
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

National Teacher Examinations	1
Trends in International…	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A.; Ge, Yuan – Educational and Psychological Measurement, 2021

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for…

Descriptors: Evaluators, Scoring, Data Collection, Design

Exploring the Impersonal Judgments and Personal Preferences of Raters in Rater-Mediated Assessments with Unfolding Models

Peer reviewed

Direct link

Wang, Jue; Engelhard, George, Jr. – Educational and Psychological Measurement, 2019

The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and…

Descriptors: Evaluative Thinking, Preferences, Evaluators, Models

Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Guo, Wenjing – Educational and Psychological Measurement, 2019

Rater effects, or raters' tendencies to assign ratings to performances that are different from the ratings that the performances warranted, are well documented in rater-mediated assessments across a variety of disciplines. In many real-data studies of rater effects, researchers have reported that raters exhibit more than one effect, such as a…

Descriptors: Evaluators, Bias, Scoring, Data Collection

Using Latent Semantic Analysis to Score Short Answer Constructed Responses: Automated Scoring of the Consequences Test

Peer reviewed

Direct link

LaVoie, Noelle; Parker, James; Legree, Peter J.; Ardison, Sharon; Kilcullen, Robert N. – Educational and Psychological Measurement, 2020

Automated scoring based on Latent Semantic Analysis (LSA) has been successfully used to score essays and constrained short answer responses. Scoring tests that capture open-ended, short answer responses poses some challenges for machine learning approaches. We used LSA techniques to score short answer responses to the Consequences Test, a measure…

Descriptors: Semantics, Evaluators, Essays, Scoring

Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks

Peer reviewed

Direct link

von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…

Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education

Kappa and Rater Accuracy: Paradigms and Parameters

Peer reviewed

Direct link

Conger, Anthony J. – Educational and Psychological Measurement, 2017

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…

Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis

Investigation of Rater Effects Using Social Network Analysis and Exponential Random Graph Models

Peer reviewed

Direct link

Lamprianou, Iasonas – Educational and Psychological Measurement, 2018

It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…

Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation

Evaluating Rater Accuracy in Rater-Mediated Assessments Using an Unfolding Model

Peer reviewed

Direct link

Wang, Jue; Engelhard, George, Jr.; Wolfe, Edward W. – Educational and Psychological Measurement, 2016

The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy…

Descriptors: Evaluators, Accuracy, Performance Based Assessment, Models

The Effect of Rating Unfamiliar Items on Angoff Passing Scores

Peer reviewed

Direct link

Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017

The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…

Descriptors: Scores, Item Analysis, Classification, Decision Making

Direct Behavior Rating (DBR): Generalizability and Dependability across Raters and Observations

Peer reviewed

Direct link

Christ, Theodore J.; Riley-Tillman, T. Chris; Chafouleas, Sandra M.; Boice, Christina H. – Educational and Psychological Measurement, 2010

Generalizability theory was used to examine the generalizability and dependability of outcomes from two single-item Direct Behavior Rating (DBR) scales: DBR of actively manipulating and DBR of visually distracted. DBR is a behavioral assessment tool with specific instrumentation and procedures that can be used by a variety of service delivery…

Descriptors: Generalizability Theory, Student Behavior, Data Collection, Student Evaluation

Generalizability of Scaling Gradients on Direct Behavior Ratings

Peer reviewed

Direct link

Chafouleas, Sandra M.; Christ, Theodore J.; Riley-Tillman, T. Chris – Educational and Psychological Measurement, 2009

Generalizability theory is used to examine the impact of scaling gradients on a single-item Direct Behavior Rating (DBR). A DBR refers to a type of rating scale used to efficiently record target behavior(s) following an observation occasion. Variance components associated with scale gradients are estimated using a random effects design for persons…

Descriptors: Generalizability Theory, Undergraduate Students, Scaling, Rating Scales

Interjudge Agreement and the Maximum Value of Kappa.

Peer reviewed

Umesh, U. N.; And Others – Educational and Psychological Measurement, 1989

An approach is provided for calculating maximum values of the Kappa statistic of J. Cohen (1960) as a function of observed agreement proportions between evaluators. Separate calculations are required for different matrix sizes and observed agreement levels. (SLD)

Descriptors: Equations (Mathematics), Evaluators, Heuristics, Interrater Reliability

Chi Square Tests for the Difference between Correlated Weighted Kappas and Correlated Unweighted Kappas.

Peer reviewed

Ross, Donald C. – Educational and Psychological Measurement, 1992

Large sample chi-square tests of the significance of the difference between two correlated kappas, weighted or unweighted, are derived. Cases are presented with one judge in common between the two kappas and no judge in common. An illustrative calculation is included. (Author/SLD)

Descriptors: Chi Square, Correlation, Equations (Mathematics), Evaluators

Relating the Internal Consistency of Scales to Rater Response Tendencies.

Peer reviewed

Alliger, George M.; Williams, Kevin J. – Educational and Psychological Measurement, 1992

The internal consistency of a scale and various indices of rating scale response styles (such as halo, leniency, and positive or negative response bias) are related to mean scale item intercorrelation. The consequent relationship between internal consistency and rating scale response styles is discussed. (Author/SLD)

Descriptors: Correlation, Evaluators, Interrater Reliability, Rating Scales

Information Derived from an Objectivity Study of an Interview Schedule.

Peer reviewed

Stauffer, A. J. – Educational and Psychological Measurement, 1978

Agreement among independent ratings assigned by members of observation teams administering an interview schedule was studied. The appropriateness of selection requirements and the adequacy of observer training are reviewed. Implications are drawn from the objectivity of the ratings for the estimated reliability and validity of the interview…

Descriptors: Attitudes, Bias, Elementary Secondary Education, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Evaluators	25
Scoring	9
Interrater Reliability	8
Comparative Analysis	6
Higher Education	5
Mathematical Models	5
Rating Scales	5
Undergraduate Students	5
Classification	4
Correlation	4
Decision Making	4
Equations (Mathematics)	4
Generalizability Theory	4
Item Response Theory	4
Models	4
Standard Setting (Scoring)	4
Accuracy	3
Cutting Scores	3
Data Collection	3
Difficulty Level	3
Elementary Secondary Education	3
Error of Measurement	3
Evaluation Methods	3
Item Analysis	3
Multiple Choice Tests	3
More ▼

Chafouleas, Sandra M.	2
Christ, Theodore J.	2
Engelhard, George, Jr.	2
Riley-Tillman, T. Chris	2
Wang, Jue	2
Wind, Stefanie A.	2
Alliger, George M.	1
Ardison, Sharon	1
Bachelor, Patricia A.	1
Baldwin, Peter	1
Boice, Christina H.	1
Clauser, Jerome C.	1
Conger, Anthony J.	1
Fehrmann, Melinda L.	1
Feingold, Marcia	1
Ge, Yuan	1
Guo, Wenjing	1
Hambleton, Ronald K.	1
Hertz, Norman R.	1
Hurtz, Gregory M.	1
Khorramdel, Lale	1
Kilcullen, Robert N.	1
LaVoie, Noelle	1
Lamprianou, Iasonas	1
More ▼