research-article

Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints

Authors Info & Claims
Published:01 December 2022Publication History
Skip Abstract Section

Abstract

Vast volumes of data are produced by today's scientific simulations and advanced instruments. These data cannot be stored and transferred efficiently because of limited I/O bandwidth, network speed, and storage capacity. Error-bounded lossy compression can be an effective method for addressing these issues: not only can it significantly reduce data size, but it can also control the data distortion based on user-defined error bounds. In practice, many scientific applications have specific requirements or constraints for lossy compression, in order to guarantee that the reconstructed data are valid for post hoc analysis. For example, some datasets contain irrelevant data that should be isolated in particular and users often have intuition regarding value ranges, geospatial regions, and other data subsets that are crucial for subsequent analysis. Existing state-of-the-art error-bounded lossy compressors, however, do not consider these constraints during compression, resulting in inferior compression ratios with respect to user's post hoc analysis, due to the fact that the data itself provides little or no value for post hoc analysis. In this work we address this issue by proposing an optimized framework that can preserve diverse constraints during the error-bounded lossy compression, e.g., cleaning the irrelevant data, efficiently preserving different precision for multiple value intervals, and allowing users to set diverse precision over both regular and irregular regions. We perform our evaluation on a supercomputer with up to 2,100 cores. Experiments with six real-world applications show that our proposed diverse constraints based error-bounded lossy compressor can obtain a higher visual quality or data fidelity on reconstructed data with the same or even higher compression ratios compared with the traditional state-of-the-art compressor SZ. Our experiments also demonstrate very good scalability in compression performance compared with the I/O throughput of the parallel file system.

References

  1. [1] Kay J.et al., “The community earth system model (CESM), large ensemble project: A community resource for studying climate change in the presence of internal climate variability,” Bull. Amer. Meteorological Soc., vol. 96, no. 8, pp. 13331349, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Baker A. H.et al., “A methodology for evaluating the impact of data compression on climate simulation data,” in Proc. 23rd Int. Symp. High-Perform. Parallel Distrib. Comput., 2014, pp. 203214.Google ScholarGoogle Scholar
  3. [3] Habib S.et al., “HACC: Extreme scaling and performance across diverse architectures,” Commun. ACM, vol. 60, no. 1, pp. 97104, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] D. T. with Globus. 2022. [Online]. Available: https://www.globus.org/data-transferGoogle ScholarGoogle Scholar
  5. [5] Lindstrom P., “Fixed-rate compressed floating-point arrays,” IEEE Trans. Vis. Comput. Graphics, vol. 20, no. 12, pp. 26742683, Dec. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Di S. and Cappello F., “Fast error-bounded lossy HPC data compression with SZ,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2016, pp. 730739.Google ScholarGoogle Scholar
  7. [7] Cappello F.et al., “Use cases of lossy compression for floating-point data in scientific data sets,” Int. J. High Perform. Comput. Appl., vol. 33, no. 6, pp. 12011220, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Wu X.-C.et al., “Full-state quantum circuit simulation by using data compression,” in Proc. Int. Conf. High Perform. Comput. Netw. Storage Anal., 2019, Art. no. 80.Google ScholarGoogle Scholar
  9. [9] Tao D., Di S., Liang X., Chen Z., and Cappello F., “Improving performance of iterative methods by lossy checkponting,” in Proc. 27th Int. Symp. High-Perform. Parallel Distrib. Comput., 2018, pp. 5265.Google ScholarGoogle Scholar
  10. [10] Jin S., Di S., Liang X., Tian J., Tao D., and Cappello F., “DeepSZ: A novel framework to compress deep neural networks by using error-bounded lossy compression,” in Proc. 28th Int. Symp. High-Perform. Parallel Distrib. Comput., 2019, pp. 159170.Google ScholarGoogle Scholar
  11. [11] Tao D., Di S., Chen Z., and Cappello F., “In-depth exploration of single-snapshot lossy compression techniques for N-body simulations,” in Proc. IEEE Int. Conf. Big Data, 2017, pp. 486493.Google ScholarGoogle Scholar
  12. [12] Tao D., Di S., Chen Z., and Cappello F., “Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2017, pp. 11291139.Google ScholarGoogle Scholar
  13. [13] Ainsworth M., Tugluk O., Whitney B., and Klasky S., “Multilevel techniques for compression and reduction of scientific data—The univariate case,” Comput. Vis. Sci., vol. 19, no. 5, pp. 6576, Dec. 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Sasaki N., Sato K., Endo T., and Matsuoka S., “Exploration of lossy compression for application-level checkpoint/restart,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2015, pp. 914922.Google ScholarGoogle Scholar
  15. [15] Baker A. H., Hammerling D. M., and Turton T. L., “Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data,” Comput. Graph. Forum, vol. 38, no. 3, pp. 517528, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] NYX simulation. 2022. [Online]. Available: https://amrex-astro.github.io/NyxGoogle ScholarGoogle Scholar
  17. [17] Underwood R., Di S., Calhoun J. C., and Cappello F., “FRaZ: A generic high-fidelity fixed-ratio lossy compression framework for scientific floating-point data,” 2020. [Online]. Available: https://arxiv.org/abs/2001.06139Google ScholarGoogle Scholar
  18. [18] Liang X.et al., “Error-controlled lossy compression optimized for high compression ratios of scientific datasets,” in Proc. IEEE Int. Conf. Big Data, 2018, pp. 438447.Google ScholarGoogle Scholar
  19. [19] ZSTD. 2022. [Online]. Available: https://github.com/facebook/zstd/releasesGoogle ScholarGoogle Scholar
  20. [20] GZIP. 2022. [Online]. Available: https://www.gzip.org/Google ScholarGoogle Scholar
  21. [21] Burtscher M. and Ratanaworabhan P., “FPC: A high-speed compressor for double-precision floating-point data,” IEEE Trans. Comput., vol. 58, no. 1, pp. 1831, Jan. 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Blosc compressor, 2018. [Online]. Available: http://blosc.org/Google ScholarGoogle Scholar
  23. [23] Lindstrom P. and Isenburg M., “Fast and efficient compression of floating-point data,” IEEE Trans. Vis. Comput. Graphics, vol. 12, no. 5, pp. 12451250, Sep./Oct. 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Zender C. S., “Bit grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF operators (NCO, v4.4.8+),” Geosci. Model Develop., vol. 9, no. 9, pp. 31993211, 2016.Google ScholarGoogle Scholar
  25. [25] Son S. W., Chen Z., Hendrix W., Agrawal A., Liao W.-K., and Choudhary A., “Data compression for the exascale computing era – Survey,” Supercomput. Front. Innov., vol. 1, no. 2, pp. 7688, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Ratanaworabhan P., Ke J., and Burtscher M., “Fast lossless compression of scientific floating-point data,” in Proc. Data Compression Conf., 2006, pp. 133142.Google ScholarGoogle Scholar
  27. [27] Diffenderfer J., Fox A., Hittinger J., Sanders G., and Lindstrom P., “Error analysis of ZFP compression for floating-point data,” SIAM J. Sci. Comput., vol. 41, pp. A1867–A1898, 2019.Google ScholarGoogle Scholar
  28. [28] Zhang J., Zhuo X., Moon A., Liu H., and Son S. W., “Efficient encoding and reconstruction of HPC datasets for checkpoint/restart,” in Proc. 35th Int. Conf. Massive Storage Syst. Technol., 2019, pp. 7991.Google ScholarGoogle Scholar
  29. [29] Di S., Tao D., Liang X., and Cappello F., “SZ tutorial hands-on guide,” 2018. [Online]. Available: https://www.mcs.anl.gov/shdi/download/sz-hands-on.pdfGoogle ScholarGoogle Scholar
  30. [30] Delaunay X., Courtois A., and Gouillon F., “Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files,” Geosci. Model Develop., vol. 12, no. 9, pp. 40994113, 2019.Google ScholarGoogle Scholar
  31. [31] Tao D., Di S., Liang X., Chen Z., and Cappello F., “Fixed-PSNR lossy compression for scientific data,” in Proc. IEEE Int. Conf. Cluster Comput., 2018, pp. 314318.Google ScholarGoogle Scholar
  32. [32] Hurricane ISABELA Simulation Datasets. 2022. [Online]. Available: http://vis.computer.org/vis2004contest/data.htmlGoogle ScholarGoogle Scholar
  33. [33] Katrina simulation. 2022. [Online]. Available: https://adcirc.org/home/documentation/example-problems/katrina-run-2015-nws-20-example/Google ScholarGoogle Scholar
  34. [34] Zhao K., Di S., Dmitriev M., T.-Tonellot L. D., Chen Z., and Cappello F., “Optimizing error-bounded lossy compression for scientific data by dynamic spline interpolation,” in Proc. IEEE 37th Int. Conf. Data Eng., 2021, pp. 16431654.Google ScholarGoogle Scholar
  35. [35] QMCPack. 2022. [Online]. Available: https://qmcpack.org/Google ScholarGoogle Scholar
  36. [36] R. T. Migration. 2022. [Online]. Available: http://www.seismiccity.com/RTM.htmlGoogle ScholarGoogle Scholar
  37. [37] Miranda. 2022. [Online]. Available: https://wci.llnl.gov/simulation/computer-codes/mirandaGoogle ScholarGoogle Scholar
  38. [38] Davis M., Efstathiou G., Frenk C. S., and White S. D., “The evolution of large-scale structure in a universe dominated by cold dark matter,” Astrophysical J., vol. 292, pp. 371394, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Friesen B.et al., “In situ and in-transit analysis of cosmological simulations,” Comput. Astrophys. Cosmol., vol. 3, no. 1, pp. 118, 2016.Google ScholarGoogle Scholar
  40. [40] Bebop. 2022. [Online]. Available: https://www.lcrc.anl.gov/systems/resources/bebopGoogle ScholarGoogle Scholar

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image IEEE Transactions on Parallel and Distributed Systems
    IEEE Transactions on Parallel and Distributed Systems  Volume 33, Issue 12
    Dec. 2022
    1246 pages

    U.S. Government work not protected by U.S. copyright.

    Publisher

    IEEE Press

    Publication History

    • Published: 1 December 2022

    Qualifiers

    • research-article
  • Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!