Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 11 |
Descriptor
Essay Tests | 19 |
Scoring | 9 |
Evaluators | 6 |
Interrater Reliability | 6 |
Writing Tests | 6 |
Scores | 5 |
Accuracy | 4 |
Standardized Tests | 4 |
Writing Skills | 4 |
College Entrance Examinations | 3 |
Comparative Analysis | 3 |
More ▼ |
Source
Applied Measurement in… | 19 |
Author
Powers, Donald E. | 3 |
Bridgeman, Brent | 2 |
Fowles, Mary E. | 2 |
Angoff, William H. | 1 |
Arslan, Burcu | 1 |
Attali, Yigal | 1 |
Bejar, Isaac I. | 1 |
Boyer, Michelle | 1 |
Busch, John Christian | 1 |
Choi, Ikkyu | 1 |
Cohen, Allan | 1 |
More ▼ |
Publication Type
Journal Articles | 19 |
Reports - Research | 16 |
Reports - Evaluative | 4 |
Speeches/Meeting Papers | 2 |
Information Analyses | 1 |
Reports - Descriptive | 1 |
Tests/Questionnaires | 1 |
Education Level
Grade 7 | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 3 |
Test of English as a Foreign… | 2 |
Bar Examinations | 1 |
College Level Examination… | 1 |
National Teacher Examinations | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020
Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…
Descriptors: Scoring, Expectation, Accuracy, Time
Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020
We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…
Descriptors: Scoring, Essays, Behavior, Predictive Measurement
Choi, Ikkyu; Wolfe, Edward W. – Applied Measurement in Education, 2020
Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what…
Descriptors: Scoring, Accuracy, Training, Evaluators
Sinharay, Sandip; Zhang, Mo; Deane, Paul – Applied Measurement in Education, 2019
Analysis of keystroke logging data is of increasing interest, as evident from a substantial amount of recent research on the topic. Some of the research on keystroke logging data has focused on the prediction of essay scores from keystroke logging features, but linear regression is the only prediction method that has been used in this research.…
Descriptors: Scores, Prediction, Writing Processes, Data Analysis
Finn, Bridgid; Arslan, Burcu; Walsh, Matthew – Applied Measurement in Education, 2020
To score an essay response, raters draw on previously trained skills and knowledge about the underlying rubric and score criterion. Cognitive processes such as remembering, forgetting, and skill decay likely influence rater performance. To investigate how forgetting influences scoring, we evaluated raters' scoring accuracy on TOEFL and GRE essays.…
Descriptors: Epistemology, Essay Tests, Evaluators, Cognitive Processes
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018
The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…
Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Bridgeman, Brent; Trapani, Catherine; Attali, Yigal – Applied Measurement in Education, 2012
Essay scores generated by machine and by human raters are generally comparable; that is, they can produce scores with similar means and standard deviations, and machine scores generally correlate as highly with human scores as scores from one human correlate with scores from another human. Although human and machine essay scores are highly related…
Descriptors: Scoring, Essay Tests, College Entrance Examinations, High Stakes Tests
Kane, Michael T.; Mroch, Andrew A. – Applied Measurement in Education, 2010
In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit…
Descriptors: Least Squares Statistics, Regression (Statistics), Differences, Validity
Hardison, Chaitra M.; Sackett, Paul R. – Applied Measurement in Education, 2008
Despite the growing use of writing assessments in standardized tests, little is known about coaching effects on writing assessments. Therefore, this study tested the effects of short-term coaching on standardized writing tests, and the transfer of those effects to other writing genres. College freshmen were randomly assigned to either training…
Descriptors: Control Groups, Group Membership, College Freshmen, Writing Tests

Powers, Donald E.; Fowles, Mary E. – Applied Measurement in Education, 1998
To determine the effects on test performance and test validity of releasing essay topics before an examination, 300 prospective graduate students wrote essays on a released and an unreleased topic. Analyses did not reveal any statistically significant effect of topic release. (SLD)
Descriptors: College Students, Essay Tests, Higher Education, Performance Factors

Johnson, Robert L.; Penny, James; Gordon, Belita – Applied Measurement in Education, 2000
Studied four forms of score resolution used by testing agencies and investigated the effect that each has on the interrater reliability associated with the resulting operational scores. Results, based on 120 essays from the Georgia High School Writing Test, show some forms of resolution to be associated with higher reliability and some associated…
Descriptors: Essay Tests, High School Students, High Schools, Interrater Reliability

Powers, Donald E.; Fowles, Mary E. – Applied Measurement in Education, 2002
Studied how performance on a standardized writing assessment might influence graduate admissions decisions if, along with test scores, test takers' essays were made available to admissions committees. Results for 27 test takers(2 essays each) suggest that the availability of examinee essays would have little, if any, influence on admissions…
Descriptors: Case Studies, College Admission, Decision Making, Essay Tests

Gabrielson, Stephen; And Others – Applied Measurement in Education, 1995
The effects of presenting a choice of writing tasks on the quality of essays produced by eleventh graders were studied with 34,200 students in Georgia. The choice condition had no substantive effect on the quality of essays, but race, gender, and the writing task variable did. (SLD)
Descriptors: Essay Tests, Grade 11, High School Students, High Schools
Previous Page | Next Page »
Pages: 1 | 2