Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 18 |
Since 2006 (last 20 years) | 43 |
Descriptor
Scoring | 34 |
Essays | 29 |
Essay Tests | 23 |
Computer Assisted Testing | 22 |
Correlation | 21 |
Writing Tests | 21 |
Scores | 19 |
Language Tests | 17 |
Automation | 16 |
College Entrance Examinations | 16 |
English (Second Language) | 16 |
More ▼ |
Source
ETS Research Report Series | 51 |
Author
Zhang, Mo | 10 |
Attali, Yigal | 7 |
Deane, Paul | 5 |
Ramineni, Chaitanya | 5 |
Haberman, Shelby J. | 4 |
Williamson, David M. | 4 |
Breyer, F. Jay | 3 |
Bridgeman, Brent | 3 |
Chen, Jing | 3 |
Rupp, André A. | 3 |
Sinharay, Sandip | 3 |
More ▼ |
Publication Type
Journal Articles | 51 |
Reports - Research | 50 |
Tests/Questionnaires | 6 |
Numerical/Quantitative Data | 1 |
Reports - General | 1 |
Education Level
Higher Education | 22 |
Postsecondary Education | 18 |
Secondary Education | 12 |
Junior High Schools | 8 |
Middle Schools | 8 |
Elementary Education | 6 |
High Schools | 6 |
Grade 8 | 5 |
Intermediate Grades | 3 |
Grade 10 | 2 |
Grade 12 | 2 |
More ▼ |
Audience
Location
Indiana | 2 |
California (Los Angeles) | 1 |
Canada | 1 |
China | 1 |
Georgia | 1 |
Germany | 1 |
India | 1 |
Iowa | 1 |
Japan | 1 |
Michigan | 1 |
Minnesota | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Yanxuan Qu; Sandip Sinharay – ETS Research Report Series, 2024
The goal of this paper is to find better ways to estimate the internal consistency reliability of scores on tests with a specific type of design that are often encountered in practice: tests with constructed-response items clustered into sections that are not parallel or tau-equivalent, and one of the sections has only one item. To estimate the…
Descriptors: Test Reliability, Essay Tests, Construct Validity, Error of Measurement
Ling, Guangming; Williams, Jean; O'Brien, Sue; Cavalie, Carlos F. – ETS Research Report Series, 2022
Recognizing the appealing features of a tablet (e.g., an iPad), including size, mobility, touch screen display, and virtual keyboard, more educational professionals are moving away from larger laptop and desktop computers and turning to the iPad for their daily work, such as reading and writing. Following the results of a recent survey of…
Descriptors: Tablet Computers, Computers, Essays, Scoring
Wang, Wei; Dorans, Neil J. – ETS Research Report Series, 2021
Agreement statistics and measures of prediction accuracy are often used to assess the quality of two measures of a construct. Agreement statistics are appropriate for measures that are supposed to be interchangeable, whereas prediction accuracy statistics are appropriate for situations where one variable is the target and the other variables are…
Descriptors: Classification, Scaling, Prediction, Accuracy
Paul Deane; Duanli Yan; Katherine Castellano; Yigal Attali; Michelle Lamar; Mo Zhang; Ian Blood; James V. Bruno; Chen Li; Wenju Cui; Chunyi Ruan; Colleen Appel; Kofi James; Rodolfo Long; Farah Qureshi – ETS Research Report Series, 2024
This paper presents a multidimensional model of variation in writing quality, register, and genre in student essays, trained and tested via confirmatory factor analysis of 1.37 million essay submissions to ETS' digital writing service, Criterion®. The model was also validated with several other corpora, which indicated that it provides a…
Descriptors: Writing (Composition), Essays, Models, Elementary School Students
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Cao, Yi; Chen, Jianshen; Zhang, Mo; Li, Chen – ETS Research Report Series, 2020
Scenario-based writing assessment has two salient characteristics by design: a lead-in/essay scaffolding structure and a unified scenario/topic throughout. In this study, we examine whether the scenario-based assessment design would impact students' essay scores compared to its alternative conditions, which intentionally broke the scaffolding…
Descriptors: Writing Processes, Vignettes, Writing Evaluation, Regression (Statistics)
Song, Yi; Deane, Paul; Beigman Klebanov, Beata – ETS Research Report Series, 2017
This project focuses on laying the foundations for automated analysis of argumentation schemes, supporting identification and classification of the arguments being made in a text, for the purpose of scoring the quality of written analyses of arguments. We developed annotation protocols for 20 argument prompts from a college-level test under the…
Descriptors: Scoring, Automation, Persuasive Discourse, Documentation
Finn, Bridgid; Wendler, Cathy; Ricker-Pedley, Kathryn L.; Arslan, Burcu – ETS Research Report Series, 2018
This report investigates whether the time between scoring sessions has an influence on operational and nonoperational scoring accuracy. The study evaluates raters' scoring accuracy on constructed-response essay responses for the "GRE"® General Test. Binomial linear mixed-effect models are presented that evaluate how the effect of various…
Descriptors: Intervals, Scoring, Accuracy, Essay Tests
Zhang, Mo; Chen, Jing; Ruan, Chunyi – ETS Research Report Series, 2016
Successful detection of unusual responses is critical for using machine scoring in the assessment context. This study evaluated the utility of approaches to detecting unusual responses in automated essay scoring. Two research questions were pursued. One question concerned the performance of various prescreening advisory flags, and the other…
Descriptors: Essays, Scoring, Automation, Test Scoring Machines
Chen, Jing; Fife, James H.; Bejar, Isaac I.; Rupp, André A. – ETS Research Report Series, 2016
The "e-rater"® automated scoring engine used at Educational Testing Service (ETS) scores the writing quality of essays. In the current practice, e-rater scores are generated via a multiple linear regression (MLR) model as a linear combination of various features evaluated for each essay and human scores as the outcome variable. This…
Descriptors: Scoring, Models, Artificial Intelligence, Automation
Choi, Ikkyu; Hao, Jiangang; Deane, Paul; Zhang, Mo – ETS Research Report Series, 2021
"Biometrics" are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable…
Descriptors: Accuracy, High Stakes Tests, Writing Tests, Benchmarking
Rupp, André A.; Casabianca, Jodi M.; Krüger, Maleika; Keller, Stefan; Köller, Olaf – ETS Research Report Series, 2019
In this research report, we describe the design and empirical findings for a large-scale study of essay writing ability with approximately 2,500 high school students in Germany and Switzerland on the basis of 2 tasks with 2 associated prompts, each from a standardized writing assessment whose scoring involved both human and automated components.…
Descriptors: Automation, Foreign Countries, English (Second Language), Language Tests
Yao, Lili; Haberman, Shelby J.; Zhang, Mo – ETS Research Report Series, 2019
Many assessments of writing proficiency that aid in making high-stakes decisions consist of several essay tasks evaluated by a combination of human holistic scores and computer-generated scores for essay features such as the rate of grammatical errors per word. Under typical conditions, a summary writing score is provided by a linear combination…
Descriptors: Prediction, True Scores, Computer Assisted Testing, Scoring
Zhu, Mengxiao; Zhang, Mo; Deane, Paul – ETS Research Report Series, 2019
The research on using event logs and item response time to study test-taking processes is rapidly growing in the field of educational measurement. In this study, we analyzed the keystroke logs collected from 761 middle school students in the United States as they completed a persuasive writing task. Seven variables were extracted from the…
Descriptors: Keyboarding (Data Entry), Data Collection, Data Analysis, Writing Processes
Breyer, F. Jay; Rupp, André A.; Bridgeman, Brent – ETS Research Report Series, 2017
In this research report, we present an empirical argument for the use of a contributory scoring approach for the 2-essay writing assessment of the analytical writing section of the "GRE"® test in which human and machine scores are combined for score creation at the task and section levels. The approach was designed to replace a currently…
Descriptors: College Entrance Examinations, Scoring, Essay Tests, Writing Evaluation