PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines

Jorge Piazentin Ono, Sonia Castelo Quispe, Roque Lopez, Enrico Bertini, Juliana Freire, Claudio Silva

View presentation:2020-10-27T19:00:00ZGMT-0600Change your timezone on the schedule page
2020-10-27T19:00:00Z
Exemplar figure
PipelineProfiler applied to the analysis of binary classification pipelines A) The system is integrated with Jupyter Notebook and can be invoked with one line of code. B) PipelineProfiler menu C) Pipeline Matrix: C1) Primitives (columns) used by the pipelines (rows). C2) Tooltip showing the metadata and hyperparameters for a primitive. C3) One-hot-encoded hyperparameters (columns) for the primitive Xgboost Gbtree across pipelines (rows). C4) Pipeline scores: users can select different metrics to rank pipelines. C5) Primitive Contribution View, showing correlations between primitive usage and pipeline scores. D) Pipeline Comparison View: visual comparison of the top-3 scoring pipelines.
Fast forward

Direct link to video on YouTube: https://youtu.be/0FlwKtToYLQ

Keywords

Automatic Machine Learning, Pipeline Visualization, Model Evaluation

Abstract

In recent years, a wide variety of automated machine learning (AutoML) methods have been proposed to generate end-to-end ML pipelines. While these techniques facilitate the creation of models, given their black-box nature, the complexity of the underlying algorithms, and the large number of pipelines they derive, they are difficult for developers to debug. It is also challenging for machine learning experts to select an AutoML system that is well suited for a given problem. In this paper, we present the PipelineProfiler, an interactive visualization tool that allows the exploration and comparison of the solution space of machine learning (ML) pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be combined with common data science tools to enable a rich set of analyses of the ML pipelines, providing users a better understanding of the algorithms that generated them as well as insights into how they can be improved. We demonstrate the utility of our tool through use cases where PipelineProfiler is used to better understand and improve a real-world AutoML system. Furthermore, we validate our approach by presenting a detailed analysis of a think-aloud experiment with six data scientists who develop and evaluate AutoML tools.