Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 2 |
Descriptor
Quality Control | 5 |
Scoring | 3 |
Test Construction | 3 |
Test Interpretation | 3 |
Automation | 2 |
Essay Tests | 2 |
Scores | 2 |
Test Scoring Machines | 2 |
Test Validity | 2 |
Validity | 2 |
Behavior | 1 |
More ▼ |
Source
Applied Measurement in… | 5 |
Author
Bejar, Isaac I. | 2 |
Downing, Steven M. | 1 |
Dunbar, Stephen B. | 1 |
Haladyna, Thomas M. | 1 |
Li, Chen | 1 |
McCaffrey, Daniel | 1 |
Rupp, André A. | 1 |
Sax, Anne | 1 |
Williamson, David M. | 1 |
Publication Type
Journal Articles | 5 |
Reports - Evaluative | 3 |
Reports - Descriptive | 1 |
Reports - Research | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020
We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…
Descriptors: Scoring, Essays, Behavior, Predictive Measurement
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Williamson, David M.; Bejar, Isaac I.; Sax, Anne – Applied Measurement in Education, 2004
As automated scoring of complex constructed-response examinations reaches operational status, the process of evaluating the quality of resultant scores, particularly in contrast to scores of expert human graders, becomes as complex as the data itself. Using a vignette from the Architectural Registration Examination (ARE), this article explores the…
Descriptors: Validity, Scoring, Scores, Evaluation Methods

Downing, Steven M.; Haladyna, Thomas M. – Applied Measurement in Education, 1997
An ideal process is outlined for test item development and the study of item responses to ensure that tests are sound. Qualitative and quantitative methods are used to assess the item-level validity evidence for high-stakes examinations. A checklist for assessment is provided. (SLD)
Descriptors: High Stakes Tests, Item Response Theory, Qualitative Research, Quality Control

Dunbar, Stephen B.; And Others – Applied Measurement in Education, 1991
Issues pertaining to the quality of performance assessments, including reliability and validity, are discussed. The relatively limited generalizability of performance across tasks is indicative of the care needed to evaluate performance assessments. Quality control is an empirical matter when measurement is intended to inform public policy. (SLD)
Descriptors: Educational Assessment, Generalization, Interrater Reliability, Measurement Techniques