ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	11

Source

Applied Measurement in…

Publication Type

Journal Articles	19
Reports - Research	16
Reports - Evaluative	4
Speeches/Meeting Papers	2
Information Analyses	1
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Grade 7	1
Higher Education	1
Postsecondary Education	1

Audience

Location

Georgia	1
New York	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	3
Test of English as a Foreign…	2
Bar Examinations	1
College Level Examination…	1
National Teacher Examinations	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy

Peer reviewed

Direct link

Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020

Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…

Descriptors: Scoring, Expectation, Accuracy, Time

Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring

Peer reviewed

Direct link

Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020

We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…

Descriptors: Scoring, Essays, Behavior, Predictive Measurement

The Impact of Operational Scoring Experience and Additional Mentored Training on Raters' Essay Scoring Accuracy

Peer reviewed

Direct link

Choi, Ikkyu; Wolfe, Edward W. – Applied Measurement in Education, 2020

Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what…

Descriptors: Scoring, Accuracy, Training, Evaluators

Prediction of Essay Scores from Writing Process and Product Features Using Data Mining Methods

Peer reviewed

Direct link

Sinharay, Sandip; Zhang, Mo; Deane, Paul – Applied Measurement in Education, 2019

Analysis of keystroke logging data is of increasing interest, as evident from a substantial amount of recent research on the topic. Some of the research on keystroke logging data has focused on the prediction of essay scores from keystroke logging features, but linear regression is the only prediction method that has been used in this research.…

Descriptors: Scores, Prediction, Writing Processes, Data Analysis

Applying Cognitive Theory to the Human Essay Rating Process

Peer reviewed

Direct link

Finn, Bridgid; Arslan, Burcu; Walsh, Matthew – Applied Measurement in Education, 2020

To score an essay response, raters draw on previously trained skills and knowledge about the underlying rubric and score criterion. Cognitive processes such as remembering, forgetting, and skill decay likely influence rater performance. To investigate how forgetting influences scoring, we evaluated raters' scoring accuracy on TOEFL and GRE essays.…

Descriptors: Epistemology, Essay Tests, Evaluators, Cognitive Processes

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Designing, Evaluating, and Deploying Automated Scoring Systems with Validity in Mind: Methodological Design Decisions

Peer reviewed

Direct link

Rupp, André A. – Applied Measurement in Education, 2018

This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…

Descriptors: Design, Automation, Scoring, Test Scoring Machines

Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity, and Country

Peer reviewed

Direct link

Bridgeman, Brent; Trapani, Catherine; Attali, Yigal – Applied Measurement in Education, 2012

Essay scores generated by machine and by human raters are generally comparable; that is, they can produce scores with similar means and standard deviations, and machine scores generally correlate as highly with human scores as scores from one human correlate with scores from another human. Although human and machine essay scores are highly related…

Descriptors: Scoring, Essay Tests, College Entrance Examinations, High Stakes Tests

Modeling Group Differences in OLS and Orthogonal Regression: Implications for Differential Validity Studies

Peer reviewed

Direct link

Kane, Michael T.; Mroch, Andrew A. – Applied Measurement in Education, 2010

In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit…

Descriptors: Least Squares Statistics, Regression (Statistics), Differences, Validity

Use of Writing Samples on Standardized Tests: Susceptibility to Rule-Based Coaching and the Resulting Effects on Score Improvement

Peer reviewed

Direct link

Hardison, Chaitra M.; Sackett, Paul R. – Applied Measurement in Education, 2008

Despite the growing use of writing assessments in standardized tests, little is known about coaching effects on writing assessments. Therefore, this study tested the effects of short-term coaching on standardized writing tests, and the transfer of those effects to other writing genres. College freshmen were randomly assigned to either training…

Descriptors: Control Groups, Group Membership, College Freshmen, Writing Tests

Effects of Preexamination Disclosure of Essay Topics.

Peer reviewed

Powers, Donald E.; Fowles, Mary E. – Applied Measurement in Education, 1998

To determine the effects on test performance and test validity of releasing essay topics before an examination, 300 prospective graduate students wrote essays on a released and an unreleased topic. Analyses did not reveal any statistically significant effect of topic release. (SLD)

Descriptors: College Students, Essay Tests, Higher Education, Performance Factors

The Relation between Score Resolution Methods and Interrater Reliability: An Empirical Study of an Analytic Scoring Rubric.

Peer reviewed

Johnson, Robert L.; Penny, James; Gordon, Belita – Applied Measurement in Education, 2000

Studied four forms of score resolution used by testing agencies and investigated the effect that each has on the interrater reliability associated with the resulting operational scores. Results, based on 120 essays from the Georgia High School Writing Test, show some forms of resolution to be associated with higher reliability and some associated…

Descriptors: Essay Tests, High School Students, High Schools, Interrater Reliability

Balancing Test User Needs and Responsible Professional Practice: A Case Study Involving the Assessment of Graduate-Level Writing Skills.

Peer reviewed

Powers, Donald E.; Fowles, Mary E. – Applied Measurement in Education, 2002

Studied how performance on a standardized writing assessment might influence graduate admissions decisions if, along with test scores, test takers' essays were made available to admissions committees. Results for 27 test takers(2 essays each) suggest that the availability of examinee essays would have little, if any, influence on admissions…

Descriptors: Case Studies, College Admission, Decision Making, Essay Tests

The Effects of Task Choice on the Quality of Writing Obtained in a Statewide Assessment.

Peer reviewed

Gabrielson, Stephen; And Others – Applied Measurement in Education, 1995

The effects of presenting a choice of writing tasks on the quality of essays produced by eleventh graders were studied with 34,200 students in Georgia. The choice condition had no substantive effect on the quality of essays, but race, gender, and the writing task variable did. (SLD)

Descriptors: Essay Tests, Grade 11, High School Students, High Schools

Previous Page | Next Page »

Pages: 1 | 2

Essay Tests	19
Scoring	9
Evaluators	6
Interrater Reliability	6
Writing Tests	6
Scores	5
Accuracy	4
Standardized Tests	4
Writing Skills	4
College Entrance Examinations	3
Comparative Analysis	3
Higher Education	3
Predictor Variables	3
Standard Setting (Scoring)	3
Test Scoring Machines	3
Test Validity	3
Automation	2
College Students	2
Correlation	2
Decision Making	2
Differences	2
Elementary Secondary Education	2
English (Second Language)	2
Gender Differences	2
Graduate Study	2
More ▼

Powers, Donald E.	3
Bridgeman, Brent	2
Fowles, Mary E.	2
Angoff, William H.	1
Arslan, Burcu	1
Attali, Yigal	1
Bejar, Isaac I.	1
Boyer, Michelle	1
Busch, John Christian	1
Choi, Ikkyu	1
Cohen, Allan	1
Deane, Paul	1
Finn, Bridgid	1
Gabrielson, Stephen	1
Glazer, Nancy	1
Gordon, Belita	1
Hardison, Chaitra M.	1
Johnson, Robert L.	1
Kane, Michael T.	1
Kieftenbeld, Vincent	1
Li, Chen	1
Linn, Robert L.	1
McCaffrey, Daniel	1
Mroch, Andrew A.	1
More ▼