NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1445928
Record Type: Journal
Publication Date: 2024-Oct
Pages: 23
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-1360-2357
EISSN: EISSN-1573-7608
Available Date: N/A
Machine Learning Model for ChatGPT Usage Detection in Students' Answers to Open-Ended Questions: Case of Lithuanian Language
Pavel Stefanovic; Birute Pliuskuviene; Urte Radvilaite; Simona Ramanauskaite
Education and Information Technologies, v29 n14 p18403-18425 2024
The public availability of large language models, such as chatGPT, brings additional possibilities and challenges to education. Education institutions have to identify when large language models are used and when text is generated by the student itself. In this paper, chatGPT usage in students' answers is investigated. The main aim of the research was to build a machine learning model that could be used in the evaluation of students' answers to open-ended questions written in the Lithuanian language. The model should determine whether the answers were originally written students or answered with the help of chatGPT. A new dataset of student answers has been collected in to train machine learning models. The dataset consists of original student answers, chatGPT answers, and paraphrased chatGPT answers. A total of more than 1000 answers have been prepared. 24 combinations of text pre-processing algorithms have been analyzed. In text pre-processing, the main focus was on various tokenization methods, such as the Bag of Words and Ngrams, the stemming algorithm, and the stop words list. For the analyzed dataset, these pre-processing methods were more effective than application of multilanguage BERT for document embedding. Based on the features/properties of the dataset, the following learning algorithms have been investigated: artificial neural networks, decision trees, random forest, gradient boosting trees, k-nearest neighbours, and naive Bayes. The main results show that the highest accuracy of 87% in some cases can be obtained using gradient boosting trees, random forests, and artificial neural network algorithms. The lowest accuracy has been obtained using the k-nearest neighbouring algorithm. Furthermore, the results of experimental research suggest that the usage of chatGPT in student answers can be automatically identified.
Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A