ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	9

Descriptor

Computer Assisted Testing	15
Interrater Reliability	15
Comparative Analysis	6
Computer Software	6
Evaluation Methods	6
Grading	6
Writing Evaluation	6
Educational Technology	5
Essays	5
Foreign Countries	5
Scoring	5
Essay Tests	4
English (Second Language)	3
Second Language Learning	3
Student Attitudes	3
Student Evaluation	3
Test Scoring Machines	3
Writing Tests	3
Accuracy	2
College Students	2
Correlation	2
Evaluators	2
Handwriting	2
Internet	2
Item Analysis	2
More ▼

Source

Assessing Writing	2
ALT-J: Research in Learning…	1
E-Learning	1
Educational Leadership	1
Educational Research and…	1
International Educational…	1
Journal of Applied Testing…	1
Journal of Computer Assisted…	1
Journal of Experimental…	1
New Directions for Teaching…	1
ReCALL	1
Teaching in Higher Education	1
More ▼

Publication Type

Reports - Evaluative	15
Journal Articles	12
Speeches/Meeting Papers	2
Tests/Questionnaires	1

Education Level

Higher Education	5
Postsecondary Education	3
Elementary Secondary Education	2
Secondary Education	2
Grade 11	1

Audience

Location

Hong Kong	2
Taiwan	1
United Kingdom	1
United Kingdom (Scotland)	1
Washington	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

Grading: Why You Should Trust Your Judgment

Direct link

Guskey, Thomas R.; Jung, Lee Ann – Educational Leadership, 2016

Many educators consider grades calculated from statistical algorithms more accurate, objective, and reliable than grades they calculate themselves. But in this research, the authors first asked teachers to use their professional judgment to choose a summary grade for hypothetical students. When the researchers compared the teachers' grade with the…

Descriptors: Grading, Computer Assisted Testing, Interrater Reliability, Grades (Scholastic)

Can Machine Scoring Deal with Broad and Open Writing Tests as Well as Human Readers?

Peer reviewed

Direct link

McCurry, Doug – Assessing Writing, 2010

This article considers the claim that machine scoring of writing test responses agrees with human readers as much as humans agree with other humans. These claims about the reliability of machine scoring of writing are usually based on specific and constrained writing tasks, and there is reason for asking whether machine scoring of writing requires…

Descriptors: Writing Tests, Scoring, Interrater Reliability, Computer Assisted Testing

Typing Compared with Handwriting for Essay Examinations at University: Letting the Students Choose

Peer reviewed

Direct link

Mogey, Nora; Paterson, Jessie; Burk, John; Purcell, Michael – ALT-J: Research in Learning Technology, 2010

Students at the University of Edinburgh do almost all their work on computers, but at the end of the semester they are examined by handwritten essays. Intuitively it would be appealing to allow students the choice of handwriting or typing, but this raises a concern that perhaps this might not be "fair"--that the choice a student makes,…

Descriptors: Handwriting, Essay Tests, Interrater Reliability, Grading

A Comparison of Onscreen and Paper-Based Marking in the Hong Kong Public Examination System

Peer reviewed

Direct link

Coniam, David – Educational Research and Evaluation, 2009

This paper describes a study comparing paper-based marking (PBM) and onscreen marking (OSM) in Hong Kong utilising English language essay scripts drawn from the live 2007 Hong Kong Certificate of Education Examination (HKCEE) Year 11 English Language Writing Paper. In the study, 30 raters from the 2007 HKCEE Writing Paper marked on paper 100…

Descriptors: Student Attitudes, Foreign Countries, Essays, Comparative Analysis

Experimenting with a Computer Essay-Scoring Program Based on ESL Student Writing Scripts

Peer reviewed

Direct link

Coniam, David – ReCALL, 2009

This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…

Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability

Essay Marking On-Screen: Implications for Assessment Validity

Peer reviewed

Direct link

Shaw, Stuart – E-Learning, 2008

Computer-assisted assessment offers many benefits over traditional paper methods. However, in transferring from one medium to another, it is crucial to ascertain the extent to which the new medium may alter the nature of traditional assessment practice or affect marking reliability. Whilst there is a substantial body of research comparing marking…

Descriptors: Construct Validity, Writing Instruction, Computer Assisted Testing, Student Evaluation

Peering into Large Lectures: Examining Peer and Expert Mark Agreement Using peerScholar, an Online Peer Assessment Tool

Peer reviewed

Direct link

Pare, D. E.; Joordens, S. – Journal of Computer Assisted Learning, 2008

As class sizes increase, methods of assessments shift from costly traditional approaches (e.g. expert-graded writing assignments) to more economic and logistically feasible methods (e.g. multiple-choice testing, computer-automated scoring, or peer assessment). While each method of assessment has its merits, it is peer assessment in particular,…

Descriptors: Writing Assignments, Undergraduate Students, Teaching Assistants, Peer Evaluation

Online Peer Assessment in an Inservice Science and Mathematics Teacher Education Course

Peer reviewed

Direct link

Wen, Meichun Lydia; Tsai, Chin-Chung – Teaching in Higher Education, 2008

Online or web-based peer assessment is a valuable and effective way to help the learner to examine his or her learning progress, and teachers need to be familiar with the practice before they use it in their classrooms. Therefore, the purpose of our study was to design an online peer assessment activity for 37 inservice science and mathematics…

Descriptors: Teacher Education Curriculum, Education Courses, Peer Evaluation, Research Methodology

Computer Grading of Student Prose, Using Modern Concepts and Software.

Peer reviewed

Page, Ellis Batten – Journal of Experimental Education, 1994

National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)

Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods

Generalizability, Validity, and Examinee Perceptions of a Computer-Delivered Formulating-Hypotheses Test. GRE Board Professional Report No. 90-02aP.

Download full text

Bennett, Randy Elliot; Rock, Donald A. – 1993

Formulating-Hypotheses (F-H) items present a situation and ask the examinee to generate as many explanations for it as possible. This study examined the generalizability, validity, and examinee perceptions of a computer-delivered version of the task. Eight F-H questions were administered to 192 graduate students. Half of the items restricted…

Descriptors: Computer Assisted Testing, Difficulty Level, Generalizability Theory, Graduate Students

Psychometric Properties of Student Ratings of Instruction in Online and On-Campus Courses

Peer reviewed

Direct link

McGhee, Debbie E.; Lowell, Nana – New Directions for Teaching and Learning, 2003

This study compares mean ratings, inter-rater reliabilities, and the factor structure of items for online and paper student-rating forms from the University of Washington's Instructional Assessment System. (Contains 3 figures and 2 tables.)

Descriptors: Psychometrics, Factor Structure, Student Evaluation of Teacher Performance, Test Items

Integrated Test Scoring, Performance Rating and Assessment Records Keeping.

Cason, Gerald J.; And Others – 1987

The Objective Test Scoring and Performance Rating (OTS-PR) system is a fully integrated set of 70 modular FORTRAN programs run on a VAX-8530 computer. Even with no knowledge of computers, the user can implement OTS-PR to score multiple-choice tests, include scores from external sources such as hand-scored essays or scores from nationally…

Descriptors: Clinical Experience, Computer Assisted Testing, Educational Assessment, Essay Tests

A Comparative Study of ESL Writers' Performance in a Paper-Based and a Computer-Delivered Writing Test

Peer reviewed

Direct link

Lee, H. K. – Assessing Writing, 2004

This study aimed to comprehensively investigate the impact of a word-processor on an ESL writing assessment, covering comparison of inter-rater reliability, the quality of written products, the writing process across different testing occasions using different writing media, and students' perception of a computer-delivered test. Writing samples of…

Descriptors: Writing Evaluation, Student Attitudes, Writing Tests, Testing

Evaluating Computer Automated Scoring: Issues, Methods, and an Empirical Illustration

Peer reviewed

Direct link

Yang, Yongwei; Buckendahl, Chad W.; Juszkiewicz, Piotr J.; Bhola, Dennison S. – Journal of Applied Testing Technology, 2005

With the continual progress of computer technologies, computer automated scoring (CAS) has become a popular tool for evaluating writing assessments. Research of applications of these methodologies to new types of performance assessments is still emerging. While research has generally shown a high agreement of CAS system generated scores with those…

Descriptors: Scoring, Validity, Interrater Reliability, Comparative Analysis

Coniam, David	2
Bennett, Randy Elliot	1
Bhola, Dennison S.	1
Buckendahl, Chad W.	1
Burk, John	1
Cason, Gerald J.	1
Doewes, Afrizal	1
Guskey, Thomas R.	1
Joordens, S.	1
Jung, Lee Ann	1
Juszkiewicz, Piotr J.	1
Kurdhi, Nughthoh Arfawi	1
Lee, H. K.	1
Lowell, Nana	1
McCurry, Doug	1
McGhee, Debbie E.	1
Mogey, Nora	1
Page, Ellis Batten	1
Pare, D. E.	1
Paterson, Jessie	1
Purcell, Michael	1
Rock, Donald A.	1
Saxena, Akrati	1
Shaw, Stuart	1
More ▼