Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 4 |
Descriptor
Evaluators | 10 |
Scoring | 6 |
Item Response Theory | 4 |
Accuracy | 3 |
Essay Tests | 3 |
Performance Based Assessment | 3 |
Writing Evaluation | 3 |
Writing Tests | 3 |
Computer Assisted Testing | 2 |
Essays | 2 |
Evaluation Methods | 2 |
More ▼ |
Source
Applied Measurement in… | 2 |
Journal of Applied Measurement | 2 |
Educational and Psychological… | 1 |
International Journal of… | 1 |
Author
Wolfe, Edward W. | 10 |
Myford, Carol M. | 3 |
Engelhard, George, Jr. | 2 |
Moulder, Bradley C. | 2 |
Choi, Ikkyu | 1 |
Feltovich, Brian | 1 |
Foltz, Peter | 1 |
Glazer, Nancy | 1 |
Kao, Chi-Wen | 1 |
Manalo, Jonathan R. | 1 |
Rosenstein, Mark | 1 |
More ▼ |
Publication Type
Reports - Research | 7 |
Journal Articles | 6 |
Speeches/Meeting Papers | 5 |
Reports - Descriptive | 3 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Glazer, Nancy; Wolfe, Edward W. – Applied Measurement in Education, 2020
This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed…
Descriptors: Scoring, Test Format, Responses, Predictor Variables
Choi, Ikkyu; Wolfe, Edward W. – Applied Measurement in Education, 2020
Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what…
Descriptors: Scoring, Accuracy, Training, Evaluators
Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018
Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…
Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring
Wang, Jue; Engelhard, George, Jr.; Wolfe, Edward W. – Educational and Psychological Measurement, 2016
The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy…
Descriptors: Evaluators, Accuracy, Performance Based Assessment, Models

Myford, Carol M.; Wolfe, Edward W. – Journal of Applied Measurement, 2002
Examined a procedure for identifying and resolving discrepancies in ratings, focusing on the third rater adjudication procedure used in scoring the Test of Spoken English. Results for 1,446 adult examinees demonstrate that implementing a discrepancy resolution procedure is not sufficient in itself for quality control monitoring. (SLD)
Descriptors: Adults, Evaluators, Quality Control, Scoring

Wolfe, Edward W.; Moulder, Bradley C.; Myford, Carol M. – Journal of Applied Measurement, 2001
Describes a class of rater effects, differential rater functioning over time (DRIFT), that depicts rater-by-time interactions. Also describes Rasch measurement procedures designed to identify these types of DRIFT in rating data. Applied these procedures to simulated data to show their usefulness in classifying raters as aberrant or non-aberrant…
Descriptors: Evaluators, Interaction, Item Response Theory, Simulation
Wolfe, Edward W.; Moulder, Bradley C.; Myford, Carol M. – 1999
This paper describes a class of rater effects that depict rater-by-time interactions. This class of rater effects is referred to as differential rater functioning over time (DRIFT). This article describes several types of DRIFT (primacy/recency, differential centrality/extremism, and practice/fatigue) and Rasch measurement procedures designed to…
Descriptors: Classification, Effect Size, Evaluators, Item Response Theory
Manalo, Jonathan R.; Wolfe, Edward W. – 2000
Recently, the Test of English as a Foreign Language (TOEFL) changed by including a writing section that gives the examinee an option between computer and handwritten formats to compose their responses. Unfortunately, this may introduce several potential sources of error that might reduce the reliability and validity of the scores. The seriousness…
Descriptors: Computer Assisted Testing, Essay Tests, Evaluators, Handwriting
Wolfe, Edward W.; Kao, Chi-Wen – 1996
This paper reports the results of an analysis of the relationship between scorer behaviors and score variability. Thirty-six essay scorers were interviewed and asked to perform a think-aloud task as they scored 24 essays. Each comment made by a scorer was coded according to its content focus (i.e. appearance, assignment, mechanics, communication,…
Descriptors: Content Analysis, Educational Assessment, Essays, Evaluation Methods
Wolfe, Edward W.; Feltovich, Brian – 1994
This paper presents a model of scored cognition that incorporates two types of mental models: models of performance (i.e., the criteria for judging performance) and models of scoring (i.e., the procedural scripts for scoring an essay). In Study 1, six novice and five experienced scorers wrote definitions of three levels of a 6-point holistic…
Descriptors: Cognitive Processes, Criteria, Essays, Evaluation Methods