TY - GEN
T1 - Interactive Visualization for Data Science Scripts
AU - Faust, Rebecca
AU - Scheidegger, Carlos
AU - Isaacs, Katherine
AU - Bernstein, William Z.
AU - Sharp, Michael
AU - North, Chris
N1 - Funding Information: This work is partially supported by the NIST Graduate Student Measurement and Engineering Fellowship, through a grant with the GFSD, and the National Science Foundation under Grant # 2127309 to the Computing Research Association for the CIFellows 2021 Project. Publisher Copyright: © 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - As the field of data science continues to grow, so does the need for adequate tools to understand and debug data science scripts. Current debugging practices fall short when applied to a data science setting, due to the exploratory and iterative nature of analysis scripts. Additionally, computational notebooks, the preferred scripting environment of many data scientists, present additional challenges to understanding and debugging workflows, including the non-linear execution of code snippets. This paper presents Anteater, a trace-based visual debugging method for data science scripts. Anteater automatically traces and visualizes execution data with minimal analyst input. The visualizations illustrate execution and value behaviors that aid in understanding the results of analysis scripts. To maximize the number of workflows supported, we present prototype implementations in both Python and Jupyter. Last, to demonstrate Anteater's support for analysis understanding tasks, we provide two usage scenarios on real world analysis scripts.
AB - As the field of data science continues to grow, so does the need for adequate tools to understand and debug data science scripts. Current debugging practices fall short when applied to a data science setting, due to the exploratory and iterative nature of analysis scripts. Additionally, computational notebooks, the preferred scripting environment of many data scientists, present additional challenges to understanding and debugging workflows, including the non-linear execution of code snippets. This paper presents Anteater, a trace-based visual debugging method for data science scripts. Anteater automatically traces and visualizes execution data with minimal analyst input. The visualizations illustrate execution and value behaviors that aid in understanding the results of analysis scripts. To maximize the number of workflows supported, we present prototype implementations in both Python and Jupyter. Last, to demonstrate Anteater's support for analysis understanding tasks, we provide two usage scenarios on real world analysis scripts.
KW - Debugging
KW - Interactive Visualization
KW - Jupyter
KW - Program Traces
UR - http://www.scopus.com/inward/record.url?scp=85146231047&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146231047&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/VDS57266.2022.00009
DO - https://doi.org/10.1109/VDS57266.2022.00009
M3 - Conference contribution
T3 - Proceedings - 2022 IEEE Visualization in Data Science, VDS 2022
SP - 37
EP - 45
BT - Proceedings - 2022 IEEE Visualization in Data Science, VDS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE Visualization in Data Science, VDS 2022
Y2 - 1 January 2022
ER -