Quick Clusters: A GPU-Parallel Partitioning for Efficient Path Tracing of Unstructured Volumetric Grids

Nate Morrical, Alper Sahistan, Ugur Gudukbay, Ingo Wald, Valerio Pascucci

View presentation: 2022-10-19T19:24:00Z GMT-0600 Change your timezone on the schedule page
Exemplar figure, but none was provided by the authors

Prerecorded Talk

The live footage of the talk, including the Q&A, can be viewed on the session page, (Volume) Rendering.

Fast forward

We propose a simple yet effective method for clustering finite elements to improve preprocessing times and rendering performance of unstructured volumetric grids without requiring auxiliary connectivity data. Rather than building bounding volume hierarchies (BVHs) over individual elements, we sort elements along with a Hilbert curve and aggregate neighboring elements together, improving BVH memory consumption by over an order of magnitude. Then to further reduce memory consumption, we cluster the mesh on the fly into sub-meshes with smaller indices using a series of efficient parallel mesh re-indexing operations. These clusters are then passed to a highly optimized ray tracing API for point containment queries and ray-cluster intersection testing. Each cluster is assigned a maximum extinction value for adaptive sampling, which we rasterize into non-overlapping view-aligned bins allocated along the ray. These maximum extinction bins are then used to guide the placement of samples along the ray during visualization, reducing the number of samples required by multiple orders of magnitude (depending on the dataset), thereby improving overall visualization interactivity. Using our approach, we improve rendering performance over a competitive baseline on the NASA Mars Lander dataset from 6X (1 frame per second (fps) and 1.0M rays per second (rps) up to now 6fps and 12.4~M rps, now including volumetric shadows) while simultaneously reducing memory consumption by 3X (33GB down to 11GB) and avoiding any offline preprocessing steps, enabling high-quality interactive visualization on consumer graphics cards. Then by utilizing the full 48GB of an RTX 8000, we improve the performance of Lander by 17X (1fps up to 17fps, 1.0M rps up to 35.6M rps).