ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	10

Descriptor

Comparative Testing	12
Computer Assisted Testing	12
Evaluation Methods	12
Intermode Differences	4
Interrater Reliability	4
Item Analysis	4
Scoring	4
Academic Achievement	3
Multiple Choice Tests	3
Test Format	3
Test Reliability	3
College Students	2
Comparative Analysis	2
Course Evaluation	2
Equated Scores	2
Error of Measurement	2
Foreign Countries	2
Grading	2
Item Response Theory	2
Test Scoring Machines	2
Test Validity	2
Writing Evaluation	2
Academic Ability	1
Adaptive Testing	1
Algebra	1
More ▼

Source

Journal of Technology,…	4
Applied Psychological…	1
British Educational Research…	1
British Journal of…	1
Computers & Education	1
Educational Research and…	1
Journal of Educational…	1
Journal of Personnel…	1

Publication Type

Journal Articles	11
Reports - Research	9
Reports - Evaluative	3
Collected Works - Proceedings	1

Education Level

Higher Education	8
Postsecondary Education	4
Elementary Education	1
Elementary Secondary Education	1
Grade 5	1
High Schools	1
Middle Schools	1

Audience

Location

China	1
Germany	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Online and Paper Evaluations of Courses: A Literature Review and Case Study

Peer reviewed

Direct link

Morrison, Keith – Educational Research and Evaluation, 2013

This paper reviews the literature on comparing online and paper course evaluations in higher education and provides a case study of a very large randomised trial on the topic. It presents a mixed but generally optimistic picture of online course evaluations with respect to response rates, what they indicate, and how to increase them. The paper…

Descriptors: Literature Reviews, Course Evaluation, Case Studies, Higher Education

Comparison of Oral Examination and Electronic Examination Using Paired Multiple-Choice Questions

Peer reviewed

Direct link

Ventouras, Errikos; Triantis, Dimos; Tsiakas, Panagiotis; Stergiopoulos, Charalampos – Computers & Education, 2011

The aim of the present research was to compare the use of multiple-choice questions (MCQs) as an examination method against the oral examination (OE) method. MCQs are widely used and their importance seems likely to grow, due to their inherent suitability for electronic assessment. However, MCQs are influenced by the tendency of examinees to guess…

Descriptors: Grades (Scholastic), Scoring, Multiple Choice Tests, Test Format

Constructive Multiple-Choice Testing System

Peer reviewed

Direct link

Park, Jooyong – British Journal of Educational Technology, 2010

The newly developed computerized Constructive Multiple-choice Testing system is introduced. The system combines short answer (SA) and multiple-choice (MC) formats by asking examinees to respond to the same question twice, first in the SA format, and then in the MC format. This manipulation was employed to collect information about the two…

Descriptors: Grade 5, Evaluation Methods, Multiple Choice Tests, Scores

Performance of a Generic Approach in Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Bridgeman, Brent; Trapani, Catherine – Journal of Technology, Learning, and Assessment, 2010

A generic approach in automated essay scoring produces scores that have the same meaning across all prompts, existing or new, of a writing assessment. This is accomplished by using a single set of linguistic indicators (or features), a consistent way of combining and weighting these features into essay scores, and a focus on features that are not…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Test Scoring Machines

Comparability of Computer and Paper-and-Pencil Versions of Algebra and Biology Assessments

Peer reviewed
PDF on ERIC

Download full text

Direct link

Kim, Do-Hong; Huynh, Huynh – Journal of Technology, Learning, and Assessment, 2007

This study examined comparability of student scores obtained from computerized and paper-and-pencil formats of the large-scale statewide end-of-course (EOC) examinations in the two subject areas of Algebra and Biology. Evidence in support of comparability of computerized and paper-based tests was sought by examining scale scores, item parameter…

Descriptors: Computer Assisted Testing, Measures (Individuals), Biology, Algebra

Examining Differences in Examinee Performance in Paper and Pencil and Computerized Testing

Peer reviewed
PDF on ERIC

Download full text

Direct link

Puhan, Gautam; Boughton, Keith; Kim, Sooyeon – Journal of Technology, Learning, and Assessment, 2007

The study evaluated the comparability of two versions of a certification test: a paper-and-pencil test (PPT) and computer-based test (CBT). An effect size measure known as Cohen's d and differential item functioning (DIF) analyses were used as measures of comparability at the test and item levels, respectively. Results indicated that the effect…

Descriptors: Computer Assisted Testing, Effect Size, Test Bias, Mathematics Tests

Automated Essay Scoring versus Human Scoring: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Direct link

Wang, Jinhao; Brown, Michelle Stallone – Journal of Technology, Learning, and Assessment, 2007

The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by an AES tool, IntelliMetric [TM] and human raters. Data collection included administering the Texas version of the WriterPlacer "Plus" test and obtaining scores assigned by IntelliMetric [TM] and by…

Descriptors: Test Scoring Machines, Scoring, Comparative Testing, Intermode Differences

Equating Scores from Adaptive to Linear Tests

Peer reviewed

Direct link

van der Linden, Wim J. – Applied Psychological Measurement, 2006

Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Equated Scores

Multi-Method Evaluation of Instruction in Engineering Classes

Peer reviewed

Direct link

Mohanty, Ganesh; Gretes, John; Flowers, Claudia; Algozzine, Bob; Spooner, Fred – Journal of Personnel Evaluation in Education, 2005

Student evaluation of instruction in college and university courses has been a routine and mandatory part of undergraduate and graduate education for some time. A major shortcoming of the process is that it often relies exclusively on the opinions or qualitative judgments of students rather than the learning or transfer of knowledge that takes…

Descriptors: Evaluation Methods, Engineering Education, College Instruction, Student Evaluation of Teacher Performance

Proceedings of the 1979 Computerized Adaptive Testing Conference (Wayzata, Minnesota, June 27-30, 1979).

Weiss, David J., Ed. – 1980

This report is the Proceedings of the third conference of its type. Included are 23 of the 25 papers presented at the conference, discussion of these papers by invited discussants, and symposium papers by a group of leaders in adaptive testing and latent trait test theory research and applications. The papers are organized into the following…

Descriptors: Academic Ability, Academic Achievement, Comparative Testing, Computer Assisted Testing

Algozzine, Bob	1
Attali, Yigal	1
Boughton, Keith	1
Bridgeman, Brent	1
Brown, Michelle Stallone	1
Flowers, Claudia	1
Gretes, John	1
Hamid Mohammadi	1
Huynh, Huynh	1
Jonas Flodén	1
Kim, Do-Hong	1
Kim, Sooyeon	1
Mark J. Gierl	1
Mohanty, Ganesh	1
Morrison, Keith	1
Park, Jooyong	1
Puhan, Gautam	1
Spooner, Fred	1
Stergiopoulos, Charalampos	1
Tahereh Firoozi	1
Trapani, Catherine	1
Triantis, Dimos	1
Tsiakas, Panagiotis	1
Ventouras, Errikos	1
Wang, Jinhao	1
More ▼