Accelerating Unstructured Mesh Point Location with RT Cores

Nate Morrical, Ingo Wald, Will Usher, Valerio Pascucci

View presentation:2021-10-28T15:30:00ZGMT-0600Change your timezone on the schedule page
Exemplar figure, described by caption below
The Agulhas Current dataset, courtesy Niklas Röber, DKRZ. This image shows simulated ocean currents off the coast of South Africa, represented using cell-centered wedges. When rendered using our hardware accelerated point queries, we see up to a 14.86× performance improvement over a CUDA reference implementation (2.49 FPS vs 37 FPS on an RTX 2080 at 1024×1024).
Fast forward

Direct link to video on YouTube:


Scientific Ray Tracing, Unstructured Dcalar Data, GPGPU, Simulation, Volume Rendering


We present a technique that leverages ray tracing hardware available in recent Nvidia RTX GPUs to solve a problem other than classical ray tracing. Specifically, we demonstrate how to use these units to accelerate the point location of general unstructured elements consisting of both planar and bilinear faces. This unstructured mesh point location problem has previously been challenging to accelerate on GPU architectures; yet, the performance of these queries is crucial to many unstructured volume rendering and compute applications. Starting with a CUDA reference method, we describe and evaluate three approaches that reformulate these point queries to incrementally map algorithmic complexity to these new hardware ray tracing units. Each variant replaces the simpler problem of point queries with a more complex one of ray queries. Initial variants exploit ray tracing cores for accelerated BVH traversal, and subsequent variants use ray-triangle intersections and per-face metadata to detect point-in-element intersections. Although these later variants are more algorithmically complex, they are significantly faster than the reference method thanks to hardware acceleration. Using our approach, we improve the performance of an unstructured volume renderer by up to 4× for tetrahedral meshes and up to 15× for general bilinear element meshes, matching, or out-performing state-of-the-art solutions while simultaneously improving on robustness and ease-of-implementation.