Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 12 |
Since 2006 (last 20 years) | 23 |
Descriptor
Automation | 40 |
Test Scoring Machines | 40 |
Scoring | 30 |
Computer Assisted Testing | 16 |
Essays | 12 |
Scores | 9 |
Evaluation Methods | 8 |
Essay Tests | 6 |
Higher Education | 6 |
Interrater Reliability | 6 |
Language Tests | 6 |
More ▼ |
Source
Author
Publication Type
Education Level
Higher Education | 8 |
Postsecondary Education | 6 |
Elementary Education | 2 |
Elementary Secondary Education | 1 |
Grade 4 | 1 |
Grade 6 | 1 |
Grade 8 | 1 |
Secondary Education | 1 |
Audience
Practitioners | 2 |
Location
China | 2 |
Taiwan | 2 |
California (Los Angeles) | 1 |
Georgia | 1 |
India | 1 |
Indiana | 1 |
Iowa | 1 |
Israel | 1 |
Japan | 1 |
Louisiana | 1 |
Michigan | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 1 |
Assessments and Surveys
Graduate Record Examinations | 5 |
Test of English as a Foreign… | 5 |
Slosson Intelligence Test | 1 |
Wechsler Adult Intelligence… | 1 |
What Works Clearinghouse Rating
Naima Debbar – International Journal of Contemporary Educational Research, 2024
Intelligent systems of essay grading constitute important tools for educational technologies. They can significantly replace the manual scoring efforts and provide instructional feedback as well. These systems typically include two main parts: a feature extractor and an automatic grading model. The latter is generally based on computational and…
Descriptors: Test Scoring Machines, Computer Uses in Education, Artificial Intelligence, Essay Tests
Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023
Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…
Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems
Gong, Kaixuan – Asian-Pacific Journal of Second and Foreign Language Education, 2023
The extensive use of automated speech scoring in large-scale speaking assessment can be revolutionary not only to test design and rating, but also to the learning and instruction of speaking based on how students and teachers perceive and react to this technology. However, its washback remained underexplored. This mixed-method study aimed to…
Descriptors: Second Language Learning, Language Tests, English (Second Language), Automation
Zhang, Mo; Chen, Jing; Ruan, Chunyi – ETS Research Report Series, 2016
Successful detection of unusual responses is critical for using machine scoring in the assessment context. This study evaluated the utility of approaches to detecting unusual responses in automated essay scoring. Two research questions were pursued. One question concerned the performance of various prescreening advisory flags, and the other…
Descriptors: Essays, Scoring, Automation, Test Scoring Machines
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Loukina, Anastassia; Zechner, Klaus; Yoon, Su-Youn; Zhang, Mo; Tao, Jidong; Wang, Xinhao; Lee, Chong Min; Mulholland, Matthew – ETS Research Report Series, 2017
This report presents an overview of the "SpeechRater"? automated scoring engine model building and evaluation process for several item types with a focus on a low-English-proficiency test-taker population. We discuss each stage of speech scoring, including automatic speech recognition, filtering models for nonscorable responses, and…
Descriptors: Automation, Scoring, Speech Tests, Test Items
Chen, Lei; Zechner, Klaus; Yoon, Su-Youn; Evanini, Keelan; Wang, Xinhao; Loukina, Anatassia; Tap, Jidong; Davis, Lawrence; Lee, Chong Min; Ma, Min; Mundowsky, Robert; Lu, Chi; Leong, Chee Wee; Gyawali, Binod – ETS Research Report Series, 2018
This research report provides an overview of the R&D efforts at Educational Testing Service related to its capability for automated scoring of nonnative spontaneous speech with the "SpeechRater"? automated scoring service since its initial version was deployed in 2006. While most aspects of this R&D work have been published in…
Descriptors: Computer Assisted Testing, Scoring, Test Scoring Machines, Speech Tests
Loukina, Anastassia; Buzick, Heather – ETS Research Report Series, 2017
This study is an evaluation of the performance of automated speech scoring for speakers with documented or suspected speech impairments. Given that the use of automated scoring of open-ended spoken responses is relatively nascent and there is little research to date that includes test takers with disabilities, this small exploratory study focuses…
Descriptors: Automation, Scoring, Language Tests, Speech Tests
Lottridge, Susan; Wood, Scott; Shaw, Dan – Applied Measurement in Education, 2018
This study sought to provide a framework for evaluating machine score-ability of items using a new score-ability rating scale, and to determine the extent to which ratings were predictive of observed automated scoring performance. The study listed and described a set of factors that are thought to influence machine score-ability; these factors…
Descriptors: Program Effectiveness, Computer Assisted Testing, Test Scoring Machines, Scoring
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Ramineni, Chaitanya; Williamson, David – ETS Research Report Series, 2018
Notable mean score differences for the "e-rater"® automated scoring engine and for humans for essays from certain demographic groups were observed for the "GRE"® General Test in use before the major revision of 2012, called rGRE. The use of e-rater as a check-score model with discrepancy thresholds prevented an adverse impact…
Descriptors: Scores, Computer Assisted Testing, Test Scoring Machines, Automation
Gerard, Libby F.; Linn, Marcia C. – Journal of Science Teacher Education, 2016
Computer scoring of student written essays about an inquiry topic can be used to diagnose student progress both to alert teachers to struggling students and to generate automated guidance. We identify promising ways for teachers to add value to automated guidance to improve student learning. Three teachers from two schools and their 386 students…
Descriptors: Scores, Automation, Essays, Classroom Techniques
Attali, Yigal – Educational Testing Service, 2011
The e-rater[R] automated essay scoring system is used operationally in the scoring of TOEFL iBT[R] independent essays. Previous research has found support for a 3-factor structure of the e-rater features. This 3-factor structure has an attractive hierarchical linguistic interpretation with a word choice factor, a grammatical convention within a…
Descriptors: Essay Tests, Language Tests, Test Scoring Machines, Automation
Attali, Yigal – Educational Testing Service, 2011
This paper proposes an alternative content measure for essay scoring, based on the "difference" in the relative frequency of a word in high-scored versus low-scored essays. The "differential word use" (DWU) measure is the average of these differences across all words in the essay. A positive value indicates the essay is using…
Descriptors: Scoring, Essay Tests, Word Frequency, Content Analysis
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Automated scoring models for the "e-rater"® scoring engine were built and evaluated for the "GRE"® argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in…
Descriptors: Scoring, Test Scoring Machines, Automation, Models