Visualizing the Scripts of Data Wrangling with SOMNUS

Kai Xiong, Siwei Fu, Guoming Ding, Zhongsu Luo, Rong Yu, Wei Chen, Hujun Bao, Yingcai Wu

View presentation:2022-10-19T14:12:00ZGMT-0600Change your timezone on the schedule page
2022-10-19T14:12:00Z
Exemplar figure, described by caption below
SOMNUS is a pipeline to visualize the semantics of code pieces in the context of data transformation. SOMNUS accepts a data wrangling script with its source tables as input and results in a glyph-based provenance graph where nodes represent tables and edges denote data transformations. We implement SOMNUS as a web-based system, which can help data scientists validate the procedure of data transformation. We also apply SOMNUS to explain scripts generated by MORPHEUS to reveal intermediate data transformations given source and target tables.

Prerecorded Talk

The live footage of the talk, including the Q&A, can be viewed on the session page, Transforming Tabular Data and Grammars.

Fast forward
Keywords

Program understanding, data transformation, visualization design

Abstract

Data workers use various scripting languages for data transformation, such as SAS, R, and Python. However, understanding intricate code pieces requires advanced programming skills, which hinders data workers from grasping the idea of data transformation at ease. Program visualization is beneficial for debugging and education and has the potential to illustrate transformations intuitively and interactively. In this paper, we explore visualization design for demonstrating the semantics of code pieces in the context of data transformation. First, to depict individual data transformations, we structure a design space by two primary dimensions, i.e., key parameters to encode and possible visual channels to be mapped. Then, we derive a collection of 23 glyphs that visualize the semantics of transformations. Next, we design a pipeline, named Somnus, that provides an overview of the creation and evolution of data tables using a provenance graph. At the same time, it allows detailed investigation of individual transformations. User feedback on Somnus is positive. Our study participants achieved better accuracy with less time using Somnus, and preferred it over carefully-crafted textual description. Further, we provide two example applications to demonstrate the utility and versatility of Somnus.