VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions

Xian Teng, Yongsu Ahn, Yu-Ru Lin

Room: 109

2023-10-24T22:48:00ZGMT-0600Change your timezone on the schedule page
Exemplar figure, described by caption below
This work provides a de-paradox workflow to help analyze observational data and overcome spurious and paradoxical associations. Spurious associations, including Simpson's paradox, are prevalent in observational studies. E.g., in a study that investigates the effect of a job training program, the cause (training program) and outcome (earnings) can be distorted by a third variable (ethnicity), leading to a misleading interpretation of the causal effect. We identify two major sources for spuriousness: (1) confounding bias and (2) subgroup heterogeneity, based on causal literature. We develop VISPUR, visualizing spurious associations, a visual analytic system to enable causal analysis of spurious associations. The system incorporates a suite of statistical techniques, algorithms, and visual components to help identify causal roots of spurious associations, as well as modules to reason about association paradox and to make informed decisions.
Fast forward
Full Video

Causal Analysis, Simpson’s Paradox, Spurious Associations, Machine Learning, Decision Making


Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding factors and subgroup heterogeneity. The famous Simpson’s paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose VISPUR, a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a CONFOUNDER DASHBOARD, which can automatically identify possible confounding factors, and a SUBGROUP VIEWER, which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a REASONING STORYBOARD, which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed “de-paradox” workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.