Abstract
Vast volumes of data are produced by today's scientific simulations and advanced instruments. These data cannot be stored and transferred efficiently because of limited I/O bandwidth, network speed, and storage capacity. Error-bounded lossy compression can be an effective method for addressing these issues: not only can it significantly reduce data size, but it can also control the data distortion based on user-defined error bounds. In practice, many scientific applications have specific requirements or constraints for lossy compression, in order to guarantee that the reconstructed data are valid for post hoc analysis. For example, some datasets contain irrelevant data that should be isolated in particular and users often have intuition regarding value ranges, geospatial regions, and other data subsets that are crucial for subsequent analysis. Existing state-of-the-art error-bounded lossy compressors, however, do not consider these constraints during compression, resulting in inferior compression ratios with respect to user's post hoc analysis, due to the fact that the data itself provides little or no value for post hoc analysis. In this work we address this issue by proposing an optimized framework that can preserve diverse constraints during the error-bounded lossy compression, e.g., cleaning the irrelevant data, efficiently preserving different precision for multiple value intervals, and allowing users to set diverse precision over both regular and irregular regions. We perform our evaluation on a supercomputer with up to 2,100 cores. Experiments with six real-world applications show that our proposed diverse constraints based error-bounded lossy compressor can obtain a higher visual quality or data fidelity on reconstructed data with the same or even higher compression ratios compared with the traditional state-of-the-art compressor SZ. Our experiments also demonstrate very good scalability in compression performance compared with the I/O throughput of the parallel file system.
- [1] , “The community earth system model (CESM), large ensemble project: A community resource for studying climate change in the presence of internal climate variability,” Bull. Amer. Meteorological Soc., vol. 96, no. 8, pp. 1333–1349, 2015.Google ScholarCross Ref
- [2] , “A methodology for evaluating the impact of data compression on climate simulation data,” in Proc. 23rd Int. Symp. High-Perform. Parallel Distrib. Comput.,
2014 , pp. 203–214.Google Scholar - [3] , “HACC: Extreme scaling and performance across diverse architectures,” Commun. ACM, vol. 60, no. 1, pp. 97–104, 2016.Google ScholarDigital Library
- [4] D. T. with Globus. 2022. [Online]. Available: https://www.globus.org/data-transferGoogle Scholar
- [5] , “Fixed-rate compressed floating-point arrays,” IEEE Trans. Vis. Comput. Graphics, vol. 20, no. 12, pp. 2674–2683, Dec. 2014.Google ScholarCross Ref
- [6] , “Fast error-bounded lossy HPC data compression with SZ,” in Proc. IEEE Int. Parallel Distrib. Process. Symp.,
2016 , pp. 730–739.Google Scholar - [7] , “Use cases of lossy compression for floating-point data in scientific data sets,” Int. J. High Perform. Comput. Appl., vol. 33, no. 6, pp. 1201–1220, 2019.Google ScholarDigital Library
- [8] , “Full-state quantum circuit simulation by using data compression,” in Proc. Int. Conf. High Perform. Comput. Netw. Storage Anal.,
2019 , Art. no. 80.Google Scholar - [9] , “Improving performance of iterative methods by lossy checkponting,” in Proc. 27th Int. Symp. High-Perform. Parallel Distrib. Comput.,
2018 , pp. 52–65.Google Scholar - [10] , “DeepSZ: A novel framework to compress deep neural networks by using error-bounded lossy compression,” in Proc. 28th Int. Symp. High-Perform. Parallel Distrib. Comput.,
2019 , pp. 159–170.Google Scholar - [11] , “In-depth exploration of single-snapshot lossy compression techniques for N-body simulations,” in Proc. IEEE Int. Conf. Big Data,
2017 , pp. 486–493.Google Scholar - [12] , “Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization,” in Proc. IEEE Int. Parallel Distrib. Process. Symp.,
2017 , pp. 1129–1139.Google Scholar - [13] , “Multilevel techniques for compression and reduction of scientific data—The univariate case,” Comput. Vis. Sci., vol. 19, no. 5, pp. 65–76, Dec. 2018.Google ScholarDigital Library
- [14] , “Exploration of lossy compression for application-level checkpoint/restart,” in Proc. IEEE Int. Parallel Distrib. Process. Symp.,
2015 , pp. 914–922.Google Scholar - [15] , “Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data,” Comput. Graph. Forum, vol. 38, no. 3, pp. 517–528, 2019.Google ScholarCross Ref
- [16] NYX simulation. 2022. [Online]. Available: https://amrex-astro.github.io/NyxGoogle Scholar
- [17] , “FRaZ: A generic high-fidelity fixed-ratio lossy compression framework for scientific floating-point data,” 2020. [Online]. Available: https://arxiv.org/abs/2001.06139Google Scholar
- [18] , “Error-controlled lossy compression optimized for high compression ratios of scientific datasets,” in Proc. IEEE Int. Conf. Big Data,
2018 , pp. 438–447.Google Scholar - [19] ZSTD. 2022. [Online]. Available: https://github.com/facebook/zstd/releasesGoogle Scholar
- [20] GZIP. 2022. [Online]. Available: https://www.gzip.org/Google Scholar
- [21] , “FPC: A high-speed compressor for double-precision floating-point data,” IEEE Trans. Comput., vol. 58, no. 1, pp. 18–31, Jan. 2009.Google ScholarDigital Library
- [22] Blosc compressor, 2018. [Online]. Available: http://blosc.org/Google Scholar
- [23] , “Fast and efficient compression of floating-point data,” IEEE Trans. Vis. Comput. Graphics, vol. 12, no. 5, pp. 1245–1250, Sep./Oct. 2006.Google ScholarDigital Library
- [24] , “Bit grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF operators (NCO, v4.4.8+),” Geosci. Model Develop., vol. 9, no. 9, pp. 3199–3211, 2016.Google Scholar
- [25] , “Data compression for the exascale computing era – Survey,” Supercomput. Front. Innov., vol. 1, no. 2, pp. 76–88, 2014.Google ScholarDigital Library
- [26] , “Fast lossless compression of scientific floating-point data,” in Proc. Data Compression Conf.,
2006 , pp. 133–142.Google Scholar - [27] , “Error analysis of ZFP compression for floating-point data,” SIAM J. Sci. Comput., vol. 41, pp. A1867–A1898, 2019.Google Scholar
- [28] , “Efficient encoding and reconstruction of HPC datasets for checkpoint/restart,” in Proc. 35th Int. Conf. Massive Storage Syst. Technol.,
2019 , pp. 79–91.Google Scholar - [29] , “SZ tutorial hands-on guide,” 2018. [Online]. Available: https://www.mcs.anl.gov/shdi/download/sz-hands-on.pdfGoogle Scholar
- [30] , “Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files,” Geosci. Model Develop., vol. 12, no. 9, pp. 4099–4113, 2019.Google Scholar
- [31] , “Fixed-PSNR lossy compression for scientific data,” in Proc. IEEE Int. Conf. Cluster Comput.,
2018 , pp. 314–318.Google Scholar - [32] Hurricane ISABELA Simulation Datasets. 2022. [Online]. Available: http://vis.computer.org/vis2004contest/data.htmlGoogle Scholar
- [33] Katrina simulation. 2022. [Online]. Available: https://adcirc.org/home/documentation/example-problems/katrina-run-2015-nws-20-example/Google Scholar
- [34] , “Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation,” in Proc. IEEE 37th Int. Conf. Data Eng.,
2021 , pp. 1643–1654.Google Scholar - [35] QMCPack. 2022. [Online]. Available: https://qmcpack.org/Google Scholar
- [36] R. T. Migration. 2022. [Online]. Available: http://www.seismiccity.com/RTM.htmlGoogle Scholar
- [37] Miranda. 2022. [Online]. Available: https://wci.llnl.gov/simulation/computer-codes/mirandaGoogle Scholar
- [38] , “The evolution of large-scale structure in a universe dominated by cold dark matter,” Astrophysical J., vol. 292, pp. 371–394, 1985.Google ScholarCross Ref
- [39] , “In situ and in-transit analysis of cosmological simulations,” Comput. Astrophys. Cosmol., vol. 3, no. 1, pp. 1–18, 2016.Google Scholar
- [40] Bebop. 2022. [Online]. Available: https://www.lcrc.anl.gov/systems/resources/bebopGoogle Scholar
Comments