NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)1
Since 2006 (last 20 years)7
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015
Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…
Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring
Holifield-Scott, April – ProQuest LLC, 2011
A study was conducted to determine the extent to which high school and college/university Advanced Placement English Language and Composition readers value and implement the curricular requirements of Advanced Placement English Language and Composition. The participants were 158 readers of the 2010 Advanced Placement English Language and…
Descriptors: Advanced Placement, English Instruction, Writing (Composition), English Curriculum
Peer reviewed Peer reviewed
Direct linkDirect link
Barkaoui, Khaled – Assessment in Education: Principles, Policy & Practice, 2011
This study examined the effects of marking method and rater experience on ESL (English as a Second Language) essay test scores and rater performance. Each of 31 novice and 29 experienced raters rated a sample of ESL essays both holistically and analytically. Essay scores were analysed using a multi-faceted Rasch model to compare test-takers'…
Descriptors: Writing Evaluation, Writing Tests, Essay Tests, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Johnson, Martin; Nadas, Rita; Bell, John F. – British Journal of Educational Technology, 2010
There is a growing body of research literature that considers how the mode of assessment, either computer-based or paper-based, might affect candidates' performances. Despite this, there is a fairly narrow literature that shifts the focus of attention to those making assessment judgements and which considers issues of assessor consistency when…
Descriptors: English Literature, Examiners, Evaluation Research, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Mogey, Nora; Paterson, Jessie; Burk, John; Purcell, Michael – ALT-J: Research in Learning Technology, 2010
Students at the University of Edinburgh do almost all their work on computers, but at the end of the semester they are examined by handwritten essays. Intuitively it would be appealing to allow students the choice of handwriting or typing, but this raises a concern that perhaps this might not be "fair"--that the choice a student makes,…
Descriptors: Handwriting, Essay Tests, Interrater Reliability, Grading
Peer reviewed Peer reviewed
Direct linkDirect link
Coniam, David – ReCALL, 2009
This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…
Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability
Breland, Hunter M.; Jones, Robert J. – 1988
The reliability, validity, and score discrepancies of 94 expository essays scored in conference versus remote settings were studied. Focus was on comparing holistic ratings obtained in both settings. Essays written by college freshmen on two different topics were scored by readers working in a conference setting and by different readers working in…
Descriptors: College Freshmen, Comparative Analysis, Conferences, Essay Tests
Sireci, Stephen G.; Rizavi, Saba – 2000
Although computer-based testing is becoming popular, many of these tests are limited to the use of selected-response item formats due to the difficulty in mechanically scoring constructed-response items. This limitation is unfortunate because many constructs, such as writing proficiency, can be measured more directly using items that require…
Descriptors: College Students, Comparative Analysis, Computer Uses in Education, Essay Tests
De Ayala, R. J.; And Others – 1989
The graded response (GR) model of Samejima (1969) and the partial credit model (PC) of Masters (1982) were fitted to identical writing samples that were holistically scored. The performance and relative benefits of each model were then evaluated. Writing samples were both expository and narrative. Data were from statewide assessments of secondary…
Descriptors: Comparative Analysis, Essay Tests, Holistic Evaluation, Interrater Reliability
Bridgeman, Brent; Cooper, Peter – 1998
Essays for the Graduate Management Admissions Test must be written with a word processor (except in some foreign countries). The test sponsors, the Graduate Management Admissions Council, believed that this is fair because some word processing skill is a prerequisite for advanced management education. Furthermore, it might also be unfair to…
Descriptors: College Entrance Examinations, College Students, Comparative Analysis, Essay Tests
Tiffany, Gerald E.; And Others – 1991
In 1991, a student learning outcomes assessment was conducted at Wenatchee Valley College, Washington. All English 101 students in the winter and spring quarters of 1990 wrote a 2-hour final exam. Winter quarter students wrote on the same topic while spring quarter students wrote on one of three randomly assigned topics. Five English 101…
Descriptors: Community Colleges, Comparative Analysis, Curriculum Evaluation, Essay Tests
Peer reviewed Peer reviewed
Linn, Robert L.; And Others – Applied Measurement in Education, 1992
Ten states participated in a cross-state scoring workshop in 1991, evaluating writing from elementary school, middle school, and high school students. Correlation of scores assigned by readers from one state with those from readers from another state were generally quite high. Implications for defining common standards are discussed. (SLD)
Descriptors: Comparative Analysis, Correlation, Elementary School Students, Elementary Secondary Education