ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	13

Descriptor

Computer Assisted Testing	16
Evaluation Methods	16
Interrater Reliability	16
Computer Software	9
Scoring	9
Comparative Analysis	6
Educational Technology	6
Grading	5
College Students	4
Comparative Testing	4
Correlation	4
English (Second Language)	4
Foreign Countries	4
Item Analysis	4
Second Language Learning	4
Writing Evaluation	4
Automation	3
College Faculty	3
Data Analysis	3
Essay Tests	3
Scores	3
Test Scoring Machines	3
Undergraduate Students	3
Academic Achievement	2
Artificial Intelligence	2
More ▼

Source

Journal of Technology,…	3
ALT-J: Research in Learning…	1
British Educational Research…	1
Computers & Education	1
ETS Research Report Series	1
English Language Teaching	1
International Association for…	1
Journal of Applied Testing…	1
Journal of Computer Assisted…	1
Journal of Educational…	1
Journal of Educational…	1
Journal of Experimental…	1
New Directions for Teaching…	1
ReCALL	1
More ▼

Publication Type

Journal Articles	15
Reports - Research	8
Reports - Evaluative	6
Collected Works - Proceedings	1
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Higher Education	9
Postsecondary Education	7
Elementary Secondary Education	2
High Schools	2
Secondary Education	2
Elementary Education	1
Grade 8	1
Middle Schools	1

Audience

Location

Asia	1
Australia	1
Brazil	1
China	1
Connecticut	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Hong Kong	1
Ireland	1
Israel	1
Italy	1
Japan	1
Kazakhstan	1
Netherlands	1
Norway	1
Ohio	1
Pakistan	1
Pennsylvania	1
Philippines	1
Portugal	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

Marking Student Programs Using Graph Similarity

Peer reviewed

Direct link

Naude, Kevin A.; Greyling, Jean H.; Vogts, Dieter – Computers & Education, 2010

We present a novel approach to the automated marking of student programming assignments. Our technique quantifies the structural similarity between unmarked student submissions and marked solutions, and is the basis by which we assign marks. This is accomplished through an efficient novel graph similarity measure ("AssignSim"). Our experiments…

Descriptors: Grading, Assignments, Correlation, Interrater Reliability

Typing Compared with Handwriting for Essay Examinations at University: Letting the Students Choose

Peer reviewed

Direct link

Mogey, Nora; Paterson, Jessie; Burk, John; Purcell, Michael – ALT-J: Research in Learning Technology, 2010

Students at the University of Edinburgh do almost all their work on computers, but at the end of the semester they are examined by handwritten essays. Intuitively it would be appealing to allow students the choice of handwriting or typing, but this raises a concern that perhaps this might not be "fair"--that the choice a student makes,…

Descriptors: Handwriting, Essay Tests, Interrater Reliability, Grading

Experimenting with a Computer Essay-Scoring Program Based on ESL Student Writing Scripts

Peer reviewed

Direct link

Coniam, David – ReCALL, 2009

This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…

Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability

Peering into Large Lectures: Examining Peer and Expert Mark Agreement Using peerScholar, an Online Peer Assessment Tool

Peer reviewed

Direct link

Pare, D. E.; Joordens, S. – Journal of Computer Assisted Learning, 2008

As class sizes increase, methods of assessments shift from costly traditional approaches (e.g. expert-graded writing assignments) to more economic and logistically feasible methods (e.g. multiple-choice testing, computer-automated scoring, or peer assessment). While each method of assessment has its merits, it is peer assessment in particular,…

Descriptors: Writing Assignments, Undergraduate Students, Teaching Assistants, Peer Evaluation

Comparability of Computer and Paper-and-Pencil Versions of Algebra and Biology Assessments

Peer reviewed
PDF on ERIC

Download full text

Direct link

Kim, Do-Hong; Huynh, Huynh – Journal of Technology, Learning, and Assessment, 2007

This study examined comparability of student scores obtained from computerized and paper-and-pencil formats of the large-scale statewide end-of-course (EOC) examinations in the two subject areas of Algebra and Biology. Evidence in support of comparability of computerized and paper-based tests was sought by examining scale scores, item parameter…

Descriptors: Computer Assisted Testing, Measures (Individuals), Biology, Algebra

Automated Essay Scoring versus Human Scoring: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Direct link

Wang, Jinhao; Brown, Michelle Stallone – Journal of Technology, Learning, and Assessment, 2007

The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…

Descriptors: Test Scoring Machines, Scoring, Comparative Testing, Intermode Differences

Toward More Substantively Meaningful Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Ben-Simon, Anat; Bennett, Randy Elliott – Journal of Technology, Learning, and Assessment, 2007

This study evaluated a "substantively driven" method for scoring NAEP writing assessments automatically. The study used variations of an existing commercial program, e-rater[R], to compare the performance of three approaches to automated essay scoring: a "brute-empirical" approach in which variables are selected and weighted solely according to…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays

Computer Grading of Student Prose, Using Modern Concepts and Software.

Peer reviewed

Page, Ellis Batten – Journal of Experimental Education, 1994

National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)

Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods

Toward an Understanding of the Role of Speech Recognition in Nonnative Speech Assessment. TOEFL iBT Research Report. TOEFL iBT-02. ETS RR-07-02

Peer reviewed
PDF on ERIC

Download full text

Zechner, Klaus; Bejar, Isaac I.; Hemat, Ramin – ETS Research Report Series, 2007

The increasing availability and performance of computer-based testing has prompted more research on the automatic assessment of language and speaking proficiency. In this investigation, we evaluated the feasibility of using an off-the-shelf speech-recognition system for scoring speaking prompts from the LanguEdge field test of 2002. We first…

Descriptors: Role, Computer Assisted Testing, Language Proficiency, Oral Language

A Computer-Based Approach for Deriving and Measuring Individual and Team Knowledge Structure from Essay Questions

Peer reviewed

Direct link

Clariana, Roy B.; Wallace, Patricia – Journal of Educational Computing Research, 2007

This proof-of-concept investigation describes a computer-based approach for deriving the knowledge structure of individuals and of groups from their written essays, and considers the convergent criterion-related validity of the computer-based scores relative to human rater essay scores and multiple-choice test scores. After completing a…

Descriptors: Computer Assisted Testing, Multiple Choice Tests, Construct Validity, Cognitive Structures

Psychometric Properties of Student Ratings of Instruction in Online and On-Campus Courses

Peer reviewed

Direct link

McGhee, Debbie E.; Lowell, Nana – New Directions for Teaching and Learning, 2003

This study compares mean ratings, inter-rater reliabilities, and the factor structure of items for online and paper student-rating forms from the University of Washington's Instructional Assessment System. (Contains 3 figures and 2 tables.)

Descriptors: Psychometrics, Factor Structure, Student Evaluation of Teacher Performance, Test Items

Evaluating Computer Automated Scoring: Issues, Methods, and an Empirical Illustration

Peer reviewed

Direct link

Yang, Yongwei; Buckendahl, Chad W.; Juszkiewicz, Piotr J.; Bhola, Dennison S. – Journal of Applied Testing Technology, 2005

With the continual progress of computer technologies, computer automated scoring (CAS) has become a popular tool for evaluating writing assessments. Research of applications of these methodologies to new types of performance assessments is still emerging. While research has generally shown a high agreement of CAS system generated scores with those…

Descriptors: Scoring, Validity, Interrater Reliability, Comparative Analysis

Previous Page | Next Page »

Pages: 1 | 2

Bejar, Isaac I.	1
Ben-Simon, Anat	1
Bennett, Randy Elliott	1
Bhola, Dennison S.	1
Brown, Michelle Stallone	1
Buckendahl, Chad W.	1
Burk, John	1
Clariana, Roy B.	1
Coniam, David	1
Greyling, Jean H.	1
Hamid Mohammadi	1
Hemat, Ramin	1
Huynh, Huynh	1
Jonas Flodén	1
Joordens, S.	1
Juszkiewicz, Piotr J.	1
Kim, Do-Hong	1
Linlin, Cao	1
Lowell, Nana	1
Mark J. Gierl	1
McGhee, Debbie E.	1
Mogey, Nora	1
Naude, Kevin A.	1
Page, Ellis Batten	1
Pare, D. E.	1
More ▼