NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 12 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Almond, Russell G. – International Journal of Testing, 2014
Assessments consisting of only a few extended constructed response items (essays) are not typically equated using anchor test designs as there are typically too few essay prompts in each form to allow for meaningful equating. This article explores the idea that output from an automated scoring program designed to measure writing fluency (a common…
Descriptors: Automation, Equated Scores, Writing Tests, Essay Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ramineni, Chaitanya; Williamson, David – ETS Research Report Series, 2018
Notable mean score differences for the "e-rater"® automated scoring engine and for humans for essays from certain demographic groups were observed for the "GRE"® General Test in use before the major revision of 2012, called rGRE. The use of e-rater as a check-score model with discrepancy thresholds prevented an adverse impact…
Descriptors: Scores, Computer Assisted Testing, Test Scoring Machines, Automation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo – ETS Research Report Series, 2013
Many testing programs use automated scoring to grade essays. One issue in automated essay scoring that has not been examined adequately is population invariance and its causes. The primary purpose of this study was to investigate the impact of sampling in model calibration on population invariance of automated scores. This study analyzed scores…
Descriptors: Automation, Scoring, Essay Tests, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests
Attali, Yigal – Educational Testing Service, 2011
This paper proposes an alternative content measure for essay scoring, based on the "difference" in the relative frequency of a word in high-scored versus low-scored essays. The "differential word use" (DWU) measure is the average of these differences across all words in the essay. A positive value indicates the essay is using…
Descriptors: Scoring, Essay Tests, Word Frequency, Content Analysis
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…
Descriptors: Scoring, Test Scoring Machines, Automation, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Johnson, Matthew S. – International Journal of Testing, 2008
"Item models" (LaDuca, Staples, Templeton, & Holzman, 1986) are classes from which it is possible to generate items that are equivalent/isomorphic to other items from the same model (e.g., Bejar, 1996, 2002). They have the potential to produce large numbers of high-quality items at reduced cost. This article introduces data from an…
Descriptors: College Entrance Examinations, Case Studies, Test Items, Models
Quinlan, Thomas; Higgins, Derrick; Wolff, Susanne – Educational Testing Service, 2009
This report evaluates the construct coverage of the e-rater[R[ scoring engine. The matter of construct coverage depends on whether one defines writing skill, in terms of process or product. Originally, the e-rater engine consisted of a large set of components with a proven ability to predict human holistic scores. By organizing these capabilities…
Descriptors: Guides, Writing Skills, Factor Analysis, Writing Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Attali, Yigal; Powers, Don; Freedman, Marshall; Harrison, Marissa; Obetz, Susan – ETS Research Report Series, 2008
This report describes the development, administration, and scoring of open-ended variants of GRE® Subject Test items in biology and psychology. These questions were administered in a Web-based experiment to registered examinees of the respective Subject Tests. The questions required a short answer of 1-3 sentences, and responses were automatically…
Descriptors: College Entrance Examinations, Graduate Study, Scoring, Test Construction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sheehan, Kathleen M.; Kostin, Irene; Futagi, Yoko; Hemat, Ramin; Zuckerman, Daniel – ETS Research Report Series, 2006
This paper describes the development, implementation, and evaluation of an automated system for predicting the acceptability status of candidate reading-comprehension stimuli extracted from a database of journal and magazine articles. The system uses a combination of classification and regression techniques to predict the probability that a given…
Descriptors: Automation, Prediction, Reading Comprehension, Classification
Sebrechts, Marc M.; And Others – 1991
This study evaluated agreement between expert system and human scores on 12 algebra word problems taken by Graduate Record Examinations (GRE) General Test examinees from a general sample of 285 and a study sample of 30. Problems were drawn from three content classes (rate x time, work, and interest) and presented in four constructed-response…
Descriptors: Algebra, Automation, College Students, Computer Assisted Testing
Bennett, Randy Elliot; Sebrechts, Marc M. – 1994
This study evaluated expert system diagnoses of examinees' solutions to complex constructed-response algebra word problems. Problems were presented to three samples (30 college students each), each of which had taken the Graduate Record Examinations General Test. One sample took the problems in paper-and-pencil form and the other two on computer.…
Descriptors: Algebra, Automation, Classification, College Entrance Examinations