STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data

Guizhen Wang, Jingjing Guo, Mingjie Tang, José Florencio de Queiroz, Calvin Yau, Anas Daghistani, Morteza Karimzadeh, Walid Aref, David Ebert

View presentation: 2020-10-30T15:00:00Z GMT-0600 Change your timezone on the schedule page
2020-10-30T15:00:00Z
Exemplar figure
Visual comparison of Ohio highway traffic incident distributions approximated by 0.3% data samples retrieved by STULL (left) and STORM (right) in 40 milliseconds or less, against the exact map (middle) at 100% data, with 32 shades of gray (colorbar). Both using 0.3% sample data, the STORM heatmap indicates hotspots are mainly located on the west side of Ohio's highway network, whereas the STULL heatmap shows hotspots across the state and better resembles the exact map.
Keywords

Geospatial Data, Data Management, Large-Scale Data Techniques, Visual Analytics

Abstract

Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are often biased, as most researchers have primarily focused on reducing computational latency. Biased sampling approaches select data with unequal probabilities and produce results that do not match the exact data distribution, leading end users to incorrect interpretations. In this paper, we propose a novel approach to perform unbiased online sampling of large spatiotemporal data. The proposed approach ensures the same probability of selection to every point that qualifies the specifications of a user's multidimensional query. To achieve unbiased sampling for accurate representative interactive visualizations, we design a novel data index and an associated sample retrieval plan. Our proposed sampling approach is suitable for a wide variety of visual analytics tasks, e.g., tasks that run aggregate queries of spatiotemporal data. Extensive experiments confirm the superiority of our approach over a state-of-the-art spatial online sampling technique, demonstrating that within the same computational time, data samples generated in our approach are at least 50\% more accurate in representing the actual spatial distribution of the data and enable approximate visualizations to present closer visual appearances to the exact ones.