DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Donald R Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, Minsuk Kahng

View presentation:2022-10-19T16:33:00ZGMT-0600Change your timezone on the schedule page
2022-10-19T16:33:00Z
Exemplar figure, described by caption below
DendroMap is an interactive visualization system for exploring large-scale image datasets used in machine learning. The initial view of DendroMap shown on the left side of the figure presents an image dataset as nested rectangles of similar images, like treemaps, providing an overview of the entire dataset. A user clicks on a rectangle with plants, insects, and animals to zoom down into the hierarchy of those images. Then, DendroMap expands this portion to take up the space of the entire screen and shows new similar image groups.

Prerecorded Talk

The live footage of the talk, including the Q&A, can be viewed on the session page, VA for ML.

Fast forward
Abstract

In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap. DendroMap is available at https://div-lab.github.io/dendromap/.