ERIC Number: ED648921
Record Type: Non-Journal
Publication Date: 2022
Pages: 129
Abstractor: As Provided
ISBN: 979-8-3526-5359-3
ISSN: N/A
EISSN: N/A
Available Date: N/A
Supporting Just-in-Time Learning for Data Science Programming
Nischal Shrestha
ProQuest LLC, Ph.D. Dissertation, North Carolina State University
Data science programming presents many challenges for programmers entering the field. Roughly, data science programming can be broken up into several activities: data wrangling, analysis, modeling, or visualization. Data wrangling is an important first step that involves cleaning and shaping tabular data--or dataframes--into a form amenable for conducting analysis. However, data wrangling code is challenging because it involves learning a plethora of data transformation operations and how they can be composed together to shape the data. Data wrangling code requires tracking and understanding numerous data transformation techniques, and it is a tedious and error-prone process. Prior work has mainly focused on tools that help end users and programmers wrangle data by providing better management of code in computational notebooks or through GUI tools that attempt to remove the need to program. However, there is a gap in the literature and existing tools to support programmers in understanding, exploring, and debugging data wrangling code interactively and flexibly. The thesis of this dissertation is: Programmers can understand, explore, and debug data wrangling code flexibly when aided by just-in-time learning tools that accommodate multiple learning objectives. The goal of this research is to help programmers understand, explore, and debug data wrangling code by exploring two just-in-time learning tools. The first study provided evidence that programmers heavily rely on opportunistic learning strategies, which involves using quicker resources and learning topics as needed. We also found that learning a language involves adapting to an entire ecosystem which includes libraries, tools, and the community. In the second study, we investigated how an online community of practice can help data scientists in the R community through a social coding project called #TidyTuesday on Twitter. We found that an online community of practice provides motivation, dissemination of knowledge, and adoption of best practices. A community of practice is a just-in-time learning tool that provides programmers flexibility on what they want to learn by browsing, adapting, and extending others' code. To help programmers understand and explore data wrangling code, we built Unralve, another just-in-time learning tool for the RStudio IDE (Interactive Development Environment) that presents visual cues and summaries of data transformations, and enables exploration via simple structured editing of the code. In a formative study, we found that Unravel provides diverse learning activities such as discovering code behavior, relationships between functions, and exploring code alternatives. To help programmers learn about and debug problems in data wrangling code effectively, we extended Unravel to highlight problems about the code and data through always-on visualizations and and automate data quality checks. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
Descriptors: Data Science, Programming, Learning Strategies, Programming Languages, Communities of Practice, Coding, Problem Solving, Educational Resources
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: National Science Foundation (NSF)
Authoring Institution: N/A
Grant or Contract Numbers: 1559593; 1755762; 1814798; 2006947
Author Affiliations: N/A