ERIC - Search Results

Publication Date

In 2025	3
Since 2024	5

Descriptor

Computer Assisted Testing	5
Interrater Reliability	5
Artificial Intelligence	4
Comparative Testing	4
Scoring Rubrics	3
Accuracy	2
Evaluation Methods	2
Foreign Countries	2
Grading	2
Reliability	2
Scoring	2
Undergraduate Students	2
Active Learning	1
College Faculty	1
College Students	1
Comparative Analysis	1
Computational Linguistics	1
Computer Software	1
Correlation	1
English (Second Language)	1
Error of Measurement	1
Ethics	1
Evaluation Criteria	1
Evaluators	1
German	1
More ▼

Source

Advances in Physiology…	1
British Educational Research…	1
Journal of Computer Assisted…	1
Journal of Educational…	1
Journal of Pan-Pacific…	1

Author

Amanda Huee-Ping Wong	1
Hamid Mohammadi	1
Ivan Cherh Chiet Low	1
John J. H. Lin	1
Jonas Flodén	1
Mark J. Gierl	1
Nathasha Vihangi Luke	1
On-Soon Lee	1
Swapna Haresh Teckwani	1
Tahereh Firoozi	1
Wen Xin Zhang	1
Ying-Shao Hsu	1
More ▼

Publication Type

Journal Articles	5
Reports - Research	5

Education Level

Higher Education	4
Postsecondary Education	4

Audience

Location

Singapore	1
South Korea	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 5 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

AI-Assisted Assessment of Inquiry Skills in Socioscientific Issue Contexts

Peer reviewed

Direct link

Wen Xin Zhang; John J. H. Lin; Ying-Shao Hsu – Journal of Computer Assisted Learning, 2025

Background Study: Assessing learners' inquiry-based skills is challenging as social, political, and technological dimensions must be considered. The advanced development of artificial intelligence (AI) makes it possible to address these challenges and shape the next generation of science education. Objectives: The present study evaluated the SSI…

Descriptors: Artificial Intelligence, Computer Assisted Testing, Inquiry, Active Learning

Examining AI-Based Accuracy Assessment in L2 Learners' Writing

Peer reviewed

Direct link

On-Soon Lee – Journal of Pan-Pacific Association of Applied Linguistics, 2024

Despite the increasing interest in using AI tools as assistant agents in instructional settings, the effectiveness of ChatGPT, the generative pretrained AI, for evaluating the accuracy of second language (L2) writing has been largely unexplored in formative assessment. Therefore, the current study aims to examine how ChatGPT, as an evaluator,…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards