A General Framework for Progressive Data Compression and Retrieval

Victor A. P. Magri, Peter Lindstrom

Room: 103

2023-10-26T22:36:00ZGMT-0600Change your timezone on the schedule page
2023-10-26T22:36:00Z
Exemplar figure, described by caption below
This work presents a framework that enables progressive-precision data queries for any data compressor. Our strategy hinges on a multi-component representation that successively reduces the error between the original and compressed fields, allowing each field in the progressive sequence to be expressed as a partial sum of components. We have implemented this approach with four established scientific data compressors and assessed its effectiveness using real-world data sets from the SDRBench collection. The results show that our framework competes in accuracy and performance with other methods that natively support progressive data compression.
Fast forward
Full Video
Keywords

Lossy to lossless compression, progressive precision, multi-component expansion, floating-point data

Abstract

In scientific simulations, observations, and experiments, the transfer of data to and from disk and across networks has become a major bottleneck for data analysis and visualization. Compression techniques have been employed to tackle this challenge, but traditional lossy methods often demand conservative error tolerances to meet the numerical accuracy requirements of both anticipated and unknown data analysis tasks. Progressive data compression and retrieval has emerged as a promising solution, where each analysis task dictates its own accuracy needs. However, few analysis algorithms inherently support progressive data processing, and adapting compression techniques, file formats, client/server frameworks, and APIs to support progressivity can be challenging. This paper presents a framework that enables progressive-precision data queries for any data compressor or numerical representation. Our strategy hinges on a multi-component representation that successively reduces the error between the original and compressed field, allowing each field in the progressive sequence to be expressed as a partial sum of components. We have implemented this approach with four established scientific data compressors and assessed its effectiveness using real-world data sets from the SDRBench collection. The results show that our framework competes in accuracy with the standalone compressors it is based upon. Additionally, (de)compression time is proportional to the number of components requested by the user. Finally, our framework allows for fully lossless compression using lossy compressors when a sufficient number of components are employed.