Evaluation of Sampling Methods for Scatterplots

Jun Yuan, Shouxing Xiang, Jiazhi Xia, Lingyun Yu, Shixia Liu

View presentation: 2020-10-30T14:30:00Z GMT-0600 Change your timezone on the schedule page
Exemplar figure
Different sampling results of the MNIST dataset: (a) the original scatterplot; (b) the result of random sampling; (c) theresult of outlier biased density based sampling; (d) the result of blue noise sampling. The three sampling methods obtain satisfying performances in preserving relative region density, outliers and the overall shapes in terms of human perception, respectively.
Fast forward

Direct link to video on YouTube: https://youtu.be/gPtAZsJKO5I


Scatterplot, data sampling, empirical evaluation.


Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but ”good” scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.