NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED639812
Record Type: Non-Journal
Publication Date: 2023
Pages: 99
Abstractor: As Provided
ISBN: 979-8-3804-8521-0
ISSN: N/A
EISSN: N/A
Available Date: N/A
Machine Learning and Natural Language Processing for Code Quality Analysis in Introductory Programming Courses
Charalampos-S Charitsis
ProQuest LLC, Ph.D. Dissertation, Stanford University
The employment rate of software developers has risen significantly over the last 30 years. As a result, more students are considering computer science as a potential career path. Over the last 15 years, introductory programming course (CS1) enrollment has been increasing at a much faster rate than the increase in the number of CS faculty, with no apparent signs of slowing. Thus, a scalability issue clearly exists. Technology has opened up learning opportunities to a wide audience. Millions of people use Massive Open Online Course (MOOC) providers, while hundreds of thousands enroll in online CS courses and coding boot camps. Automated assessment helps instructors to maintain pace with the overwhelming workload. Moreover, emphasis is mostly placed on functionality. However, software development is much more than just writing working programs. Code quality assessment remains a manual process. This dissertation focuses on critical CS1 qualitative aspects as well as how to use technology to take on time-consuming human tasks. First, I cover one of the fundamental code quality standards, "readability," and focus on its cornerstone in CS1 programs, namely function names. An identifier that captures the intended task with clarity makes code readable and self-documenting. I present and examine a semi-automated software system that I built to improve function names. It uses a variation of the Naive Bayes classifier to assess the quality of the identifiers and then suggests alternatives for the poor ones. Second, I study the relationship between problem-related entities and "functional decomposition." I introduce a method for quantifying how broad a student's view of the problem is by the time they jump into coding. I proceed with /software implementation and explain how I used natural language processing (NLP) to detect problem-related entities, which is a key stage in this process. Finally, I use the system to classify students at scale and determine how the broadness of the problem's view affects learners' performance, the time required to solve a programming challenge, and the complexity of the solution. Third, I introduce a systematic approach to detecting when novice programmers decompose their code and identify what drives their decision. I detail a software system that I built to implement these tasks. Next, I use the system to classify students and explore their relationship with program complexity and student performance. Lastly, I introduce an alternative to the standard testing approaches for functionality validation. My solution depends on code instrumentation. Its main advantages are that it takes substantially less code and can test programs with nondeterministic behavior or user input. Then, I present "Delve," an educational tool that I created for instructors and teaching assistants in introductory programming courses. Delve integrates many ideas from my previous systems and bundles them into an easy-to-use graphical user interface. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A