NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1478859
Record Type: Journal
Publication Date: 2025-Jul
Pages: 40
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-1360-2357
EISSN: EISSN-1573-7608
Available Date: 2025-02-01
What Are the Differences between Student and ChatGPT-Generated Pseudocode? Detecting AI-Generated Pseudocode in High School Programming Using Explainable Machine Learning
Zifeng Liu1; Wanli Xing1; Xinyue Jiao2; Chenglu Li3; Wangda Zhu1
Education and Information Technologies, v30 n11 p14853-14892 2025
The ability of large language models (LLMs) to generate code has raised concerns in computer science education, as students may use tools like ChatGPT for programming assignments. While much research has focused on higher education, especially for languages like Java and Python, little attention has been given to K-12 settings, particularly for pseudocode. This study seeks to bridge this gap by developing explainable machine learning models for detecting pseudocode plagiarism in online programming education. A comprehensive pseudocode dataset was constructed, comprising 7,838 pseudocode submissions from 2,578 high school students enrolled in an online programming foundations course from 2020 to 2023, along with 6,300 pseudocode samples generated by three versions of ChatGPT. An ensemble model (EM) was then proposed to detect AI-generated pseudocode and was compared with six other baseline models. SHapley Additive exPlanations were used to explain how these models differentiate AI-generated pseudocode from student submissions. The results show that students' submissions have higher similarity with GPT-3 than with the other two GPT models. The proposed model can achieve a high accuracy score of 98.97%. The differences between AI-generated pseudocode and student submissions lies in several aspects: AI-generated pseudocode often begins with more complex verbs and features shorter sentence lengths. It frequently includes clear numerical or word-based indicators of sequence and tends to incorporate more comments throughout the code. This research provides practical insights for online programming and contributes to developing educational technologies and methods that strengthen academic integrity in such courses.
Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Publication Type: Journal Articles; Reports - Research
Education Level: High Schools; Secondary Education
Audience: N/A
Language: English
Authoring Institution: N/A
Grant or Contract Numbers: 2201394; S411C230070
Department of Education Funded: Yes
Author Affiliations: 1University of Florida, School of Teaching & Learning, College of Education, Gainesville, USA; 2New York University, Steinhardt School of Culture, Education, and Human Development, New York, USA; 3University of Utah, Department of Educational Psychology, College of Education, Salt Lake City, USA