Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 9 |
Descriptor
Computer Assisted Testing | 15 |
Interrater Reliability | 15 |
Comparative Analysis | 6 |
Computer Software | 6 |
Evaluation Methods | 6 |
Grading | 6 |
Writing Evaluation | 6 |
Educational Technology | 5 |
Essays | 5 |
Foreign Countries | 5 |
Scoring | 5 |
More ▼ |
Source
Author
Coniam, David | 2 |
Bennett, Randy Elliot | 1 |
Bhola, Dennison S. | 1 |
Buckendahl, Chad W. | 1 |
Burk, John | 1 |
Cason, Gerald J. | 1 |
Doewes, Afrizal | 1 |
Guskey, Thomas R. | 1 |
Joordens, S. | 1 |
Jung, Lee Ann | 1 |
Juszkiewicz, Piotr J. | 1 |
More ▼ |
Publication Type
Reports - Evaluative | 15 |
Journal Articles | 12 |
Speeches/Meeting Papers | 2 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 5 |
Postsecondary Education | 3 |
Elementary Secondary Education | 2 |
Secondary Education | 2 |
Grade 11 | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 2 |
What Works Clearinghouse Rating
Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023
Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…
Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy
Guskey, Thomas R.; Jung, Lee Ann – Educational Leadership, 2016
Many educators consider grades calculated from statistical algorithms more accurate, objective, and reliable than grades they calculate themselves. But in this research, the authors first asked teachers to use their professional judgment to choose a summary grade for hypothetical students. When the researchers compared the teachers' grade with the…
Descriptors: Grading, Computer Assisted Testing, Interrater Reliability, Grades (Scholastic)
McCurry, Doug – Assessing Writing, 2010
This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires…
Descriptors: Writing Tests, Scoring, Interrater Reliability, Computer Assisted Testing
Mogey, Nora; Paterson, Jessie; Burk, John; Purcell, Michael – ALT-J: Research in Learning Technology, 2010
Students at the University of Edinburgh do almost all their work on computers, but at the end of the semester they are examined by handwritten essays. Intuitively it would be appealing to allow students the choice of handwriting or typing, but this raises a concern that perhaps this might not be "fair"--that the choice a student makes,…
Descriptors: Handwriting, Essay Tests, Interrater Reliability, Grading
Coniam, David – Educational Research and Evaluation, 2009
This paper describes a study comparing paper-based marking (PBM) and onscreen marking (OSM) in Hong Kong utilising English language essay scripts drawn from the live 2007 Hong Kong Certificate of Education Examination (HKCEE) Year 11 English Language Writing Paper. In the study, 30 raters from the 2007 HKCEE Writing Paper marked on paper 100…
Descriptors: Student Attitudes, Foreign Countries, Essays, Comparative Analysis
Coniam, David – ReCALL, 2009
This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…
Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability
Shaw, Stuart – E-Learning, 2008
Computer-assisted assessment offers many benefits over traditional paper methods. However, in transferring from one medium to another, it is crucial to ascertain the extent to which the new medium may alter the nature of traditional assessment practice or affect marking reliability. Whilst there is a substantial body of research comparing marking…
Descriptors: Construct Validity, Writing Instruction, Computer Assisted Testing, Student Evaluation
Pare, D. E.; Joordens, S. – Journal of Computer Assisted Learning, 2008
As class sizes increase, methods of assessments shift from costly traditional approaches (e.g. expert-graded writing assignments) to more economic and logistically feasible methods (e.g. multiple-choice testing, computer-automated scoring, or peer assessment). While each method of assessment has its merits, it is peer assessment in particular,…
Descriptors: Writing Assignments, Undergraduate Students, Teaching Assistants, Peer Evaluation
Wen, Meichun Lydia; Tsai, Chin-Chung – Teaching in Higher Education, 2008
Online or web-based peer assessment is a valuable and effective way to help the learner to examine his or her learning progress, and teachers need to be familiar with the practice before they use it in their classrooms. Therefore, the purpose of our study was to design an online peer assessment activity for 37 inservice science and mathematics…
Descriptors: Teacher Education Curriculum, Education Courses, Peer Evaluation, Research Methodology

Page, Ellis Batten – Journal of Experimental Education, 1994
National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)
Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods
Bennett, Randy Elliot; Rock, Donald A. – 1993
Formulating-Hypotheses (F-H) items present a situation and ask the examinee to generate as many explanations for it as possible. This study examined the generalizability, validity, and examinee perceptions of a computer-delivered version of the task. Eight F-H questions were administered to 192 graduate students. Half of the items restricted…
Descriptors: Computer Assisted Testing, Difficulty Level, Generalizability Theory, Graduate Students
McGhee, Debbie E.; Lowell, Nana – New Directions for Teaching and Learning, 2003
This study compares mean ratings, inter-rater reliabilities, and the factor structure of items for online and paper student-rating forms from the University of Washington's Instructional Assessment System. (Contains 3 figures and 2 tables.)
Descriptors: Psychometrics, Factor Structure, Student Evaluation of Teacher Performance, Test Items
Cason, Gerald J.; And Others – 1987
The Objective Test Scoring and Performance Rating (OTS-PR) system is a fully integrated set of 70 modular FORTRAN programs run on a VAX-8530 computer. Even with no knowledge of computers, the user can implement OTS-PR to score multiple-choice tests, include scores from external sources such as hand-scored essays or scores from nationally…
Descriptors: Clinical Experience, Computer Assisted Testing, Educational Assessment, Essay Tests
Lee, H. K. – Assessing Writing, 2004
This study aimed to comprehensively investigate the impact of a word-processor on an ESL writing assessment, covering comparison of inter-rater reliability, the quality of written products, the writing process across different testing occasions using different writing media, and students' perception of a computer-delivered test. Writing samples of…
Descriptors: Writing Evaluation, Student Attitudes, Writing Tests, Testing
Yang, Yongwei; Buckendahl, Chad W.; Juszkiewicz, Piotr J.; Bhola, Dennison S. – Journal of Applied Testing Technology, 2005
With the continual progress of computer technologies, computer automated scoring (CAS) has become a popular tool for evaluating writing assessments. Research of applications of these methodologies to new types of performance assessments is still emerging. While research has generally shown a high agreement of CAS system generated scores with those…
Descriptors: Scoring, Validity, Interrater Reliability, Comparative Analysis