Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 4 |
Descriptor
Automation | 7 |
Scoring | 7 |
Validity | 4 |
Interrater Reliability | 3 |
Models | 3 |
Test Scoring Machines | 3 |
Classification | 2 |
Correlation | 2 |
Essay Tests | 2 |
Evaluation | 2 |
Evaluation Methods | 2 |
More ▼ |
Source
ETS Research Report Series | 3 |
Applied Measurement in… | 1 |
Educational Measurement:… | 1 |
International Journal of… | 1 |
Author
Williamson, David M. | 7 |
Bejar, Isaac I. | 3 |
Breyer, F. Jay | 2 |
Ramineni, Chaitanya | 2 |
Sax, Anne | 2 |
Trapani, Catherine S. | 2 |
Bridgeman, Brent | 1 |
Davey, Tim | 1 |
Hone, Anne S. | 1 |
Miller, Susan | 1 |
Trapani, Catherine | 1 |
More ▼ |
Publication Type
Journal Articles | 6 |
Reports - Research | 5 |
Reports - Descriptive | 1 |
Reports - Evaluative | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
Praxis Series | 1 |
What Works Clearinghouse Rating
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015
Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…
Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring
Zhang, Mo; Williamson, David M.; Breyer, F. Jay; Trapani, Catherine – International Journal of Testing, 2012
This article describes two separate, related studies that provide insight into the effectiveness of "e-rater" score calibration methods based on different distributional targets. In the first study, we developed and evaluated a new type of "e-rater" scoring model that was cost-effective and applicable under conditions of absent human rating and…
Descriptors: Automation, Scoring, Models, Essay Tests
Williamson, David M.; Xi, Xiaoming; Breyer, F. Jay – Educational Measurement: Issues and Practice, 2012
A framework for evaluation and use of automated scoring of constructed-response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are…
Descriptors: Automation, Scoring, Evaluation, Guidelines
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…
Descriptors: Scoring, Test Scoring Machines, Automation, Models
Williamson, David M.; Bejar, Isaac I.; Sax, Anne – ETS Research Report Series, 2004
As automated scoring of complex constructed-response examinations reaches operational status, the process of evaluating the quality of resultant scores, particularly in contrast to scores of expert human graders, becomes as complex as the data itself. Using a vignette from the Architectural Registration Examination (ARE), this paper explores the…
Descriptors: Automation, Scoring, Tests, Classification
Williamson, David M.; Hone, Anne S.; Miller, Susan; Bejar, Isaac I. – 1998
As the automated scoring of constructed responses reaches operational status, the issue of monitoring the scoring process becomes a primary concern, particularly when the goal is to have automated scoring operate completely unassisted by humans. Using a vignette from the Architectural Registration Examination and data for 326 cases with both human…
Descriptors: Architects, Automation, Classification, Constructed Response
Williamson, David M.; Bejar, Isaac I.; Sax, Anne – Applied Measurement in Education, 2004
As automated scoring of complex constructed-response examinations reaches operational status, the process of evaluating the quality of resultant scores, particularly in contrast to scores of expert human graders, becomes as complex as the data itself. Using a vignette from the Architectural Registration Examination (ARE), this article explores the…
Descriptors: Validity, Scoring, Scores, Evaluation Methods