TY - GEN
T1 - Aardvark
T2 - 2023 IEEE Visualization in Data Science, VDS 2023
AU - Faust, Rebecca
AU - Scheidegger, Carlos
AU - North, Chris
N1 - Publisher Copyright: © 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Debugging programs is one of the most challenging and time consuming parts of programming. Data science scripts present additional challenges as debugging often centers around more exploratory tasks, such as understanding the differences between results under different parameter settings. In fact, a common exploratory debugging practice is to run, modify, and re-run a script to observe the effects of the modification. Analysts perform this process frequently as they explore different settings and algorithms in their analysis. However, traditional debugging methods are not well suited to comparing across multiple executions of a script. They often require maintaining two instances of the debugging method and making manual, serial comparisons of program values. To address this gap, we present Aardvark, a comparative trace-based debugging method for identifying and visualizing the differences between two executions of data analysis scripts. Aardvark traces two consecutive instances of an analysis script, identifies the differences between them, and presents them through comparative visualizations. We present a prototype implementation in Python as well as an extension to support scripts in Jupyter notebooks. Finally, to demonstrate Aardvark, we provide two usage scenarios on real world analysis scripts.
AB - Debugging programs is one of the most challenging and time consuming parts of programming. Data science scripts present additional challenges as debugging often centers around more exploratory tasks, such as understanding the differences between results under different parameter settings. In fact, a common exploratory debugging practice is to run, modify, and re-run a script to observe the effects of the modification. Analysts perform this process frequently as they explore different settings and algorithms in their analysis. However, traditional debugging methods are not well suited to comparing across multiple executions of a script. They often require maintaining two instances of the debugging method and making manual, serial comparisons of program values. To address this gap, we present Aardvark, a comparative trace-based debugging method for identifying and visualizing the differences between two executions of data analysis scripts. Aardvark traces two consecutive instances of an analysis script, identifies the differences between them, and presents them through comparative visualizations. We present a prototype implementation in Python as well as an extension to support scripts in Jupyter notebooks. Finally, to demonstrate Aardvark, we provide two usage scenarios on real world analysis scripts.
KW - Comparison
KW - Debugging
KW - Interactive Visualization
KW - Jupyter
KW - Program Traces
UR - http://www.scopus.com/inward/record.url?scp=85182742220&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182742220&partnerID=8YFLogxK
U2 - 10.1109/VDS60365.2023.00009
DO - 10.1109/VDS60365.2023.00009
M3 - Conference contribution
T3 - Proceedings - 2023 IEEE Visualization in Data Science, VDS 2023
SP - 30
EP - 38
BT - Proceedings - 2023 IEEE Visualization in Data Science, VDS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 October 2023
ER -