Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 13 |
Since 2006 (last 20 years) | 17 |
Descriptor
Essay Tests | 19 |
Scoring | 14 |
Writing Tests | 10 |
Evaluators | 9 |
Essays | 8 |
Interrater Reliability | 8 |
Writing Skills | 6 |
Comparative Analysis | 5 |
Correlation | 5 |
Scores | 5 |
Statistical Analysis | 5 |
More ▼ |
Source
Applied Measurement in… | 26 |
Author
Powers, Donald E. | 4 |
Attali, Yigal | 2 |
Bridgeman, Brent | 2 |
Fowles, Mary E. | 2 |
Angoff, William H. | 1 |
Arslan, Burcu | 1 |
Bejar, Isaac I. | 1 |
Ben-Simon, Anat | 1 |
Boyer, Michelle | 1 |
Busch, John Christian | 1 |
Buzick, Heather | 1 |
More ▼ |
Publication Type
Journal Articles | 26 |
Reports - Research | 21 |
Reports - Evaluative | 6 |
Speeches/Meeting Papers | 3 |
Tests/Questionnaires | 2 |
Information Analyses | 1 |
Reports - Descriptive | 1 |
Education Level
Higher Education | 4 |
Postsecondary Education | 4 |
Grade 7 | 1 |
Audience
Location
Europe | 1 |
Georgia | 1 |
Iran (Tehran) | 1 |
Israel | 1 |
New York | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 4 |
Test of English as a Foreign… | 2 |
Bar Examinations | 1 |
College Level Examination… | 1 |
National Teacher Examinations | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020
We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…
Descriptors: Scoring, Essays, Behavior, Predictive Measurement
Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020
Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…
Descriptors: Scoring, Expectation, Accuracy, Time
Hamdollah Ravand; Farshad Effatpanah; Wenchao Ma; Jimmy de la Torre; Purya Baghaei; Olga Kunina-Habenicht – Applied Measurement in Education, 2024
The purpose of this study was to explore the nature of interactions among second/foreign language (L2) writing subskills. Two types of relationships were investigated: subskill-item and subskill-subskill relationships. To achieve the first purpose, using writing data obtained from the writing essays of 500 English as a foreign language (EFL)…
Descriptors: Second Language Learning, Writing Instruction, Writing Skills, Writing Tests
Choi, Ikkyu; Wolfe, Edward W. – Applied Measurement in Education, 2020
Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what…
Descriptors: Scoring, Accuracy, Training, Evaluators
Sinharay, Sandip; Zhang, Mo; Deane, Paul – Applied Measurement in Education, 2019
Analysis of keystroke logging data is of increasing interest, as evident from a substantial amount of recent research on the topic. Some of the research on keystroke logging data has focused on the prediction of essay scores from keystroke logging features, but linear regression is the only prediction method that has been used in this research.…
Descriptors: Scores, Prediction, Writing Processes, Data Analysis
Finn, Bridgid; Arslan, Burcu; Walsh, Matthew – Applied Measurement in Education, 2020
To score an essay response, raters draw on previously trained skills and knowledge about the underlying rubric and score criterion. Cognitive processes such as remembering, forgetting, and skill decay likely influence rater performance. To investigate how forgetting influences scoring, we evaluated raters' scoring accuracy on TOEFL and GRE essays.…
Descriptors: Epistemology, Essay Tests, Evaluators, Cognitive Processes
Shermis, Mark D. – Applied Measurement in Education, 2018
This article employs the Common European Framework Reference for Language Acquisition (CEFR) as a basis for evaluating writing in the context of machine scoring. The CEFR was designed as a framework for evaluating proficiency levels of speaking for the 49 languages comprising the European Union. The intent was to impact language instruction so…
Descriptors: Scoring, Automation, Essays, Language Proficiency
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Buzick, Heather; Oliveri, Maria Elena; Attali, Yigal; Flor, Michael – Applied Measurement in Education, 2016
Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K-12 large-scale assessment. In this…
Descriptors: Essays, Learning Disabilities, Attention Deficit Hyperactivity Disorder, Scoring
Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015
By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…
Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests
Bridgeman, Brent; Trapani, Catherine; Attali, Yigal – Applied Measurement in Education, 2012
Essay scores generated by machine and by human raters are generally comparable; that is, they can produce scores with similar means and standard deviations, and machine scores generally correlate as highly with human scores as scores from one human correlate with scores from another human. Although human and machine essay scores are highly related…
Descriptors: Scoring, Essay Tests, College Entrance Examinations, High Stakes Tests
Previous Page | Next Page »
Pages: 1 | 2