Discrete Morse Sandwich: Fast Computation of Persistence Diagrams for Scalar Data -- An Algorithm and A Benchmark
Pierre Guillou, Jules Vidal, Julien Tierny
DOI: 10.1109/TVCG.2023.3238008
Room: 106
2023-10-26T03:00:00ZGMT-0600Change your timezone on the schedule page
2023-10-26T03:00:00Z
Fast forward
Full Video
Keywords
Topological data analysis;scalar data;persistence diagrams;discrete Morse theory
Abstract
This paper introduces an efficient algorithm for persistence diagram computation, given an input piecewise linear scalar field f defined on a d-dimensional simplicial complex K, with $d \leq 3$. Our work revisits the seminal algorithm "PairSimplices" [31], [103] with discrete Morse theory (DMT) [34], [80], which greatly reduces the number of input simplices to consider. Further, we also extend to DMT and accelerate the stratification strategy described in "PairSimplices" for the fast computation of the 0th and (d - 1)th diagrams, noted $D_0(f)$ and $D_{d-1}(f)$. Minima-saddle persistence pairs ($D_0(f)$) and saddle-maximum persistence pairs ($D_{d-1}(f)$) are efficiently computed by processing, with a Union-Find, the unstable sets of 1-saddles and the stable sets of (d - 1)-saddles. This fast pre-computation for the dimensions 0 and (d - 1) enables an aggressive specialization of [4] to the 3D case, which results in a drastic reduction of the number of input simplices for the computation of $D_1(f)$, the intermediate layer of the sandwich. Finally, we document several performance improvements via shared-memory parallelism. We provide an open-source implementation of our algorithm for reproducibility purposes. We also contribute a reproducible benchmark package, which exploits three-dimensional data from a public repository and compares our algorithm to a variety of publicly available implementations. Extensive experiments indicate that our algorithm improves by two orders of magnitude the time performance of the seminal "PairSimplices" algorithm it extends. Moreover, it also improves memory footprint and time performance over a selection of 14 competing approaches, with a substantial gain over the fastest available approaches, while producing a strictly identical output.