research-article

Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers

Authors:
Ahmad Abdelfattah

University of Tennessee

University of Tennessee
View Profile

,
Pieter Ghysels

Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory
View Profile

,
Wajih Boukaram

Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory
View Profile

,
Stanimire Tomov

University of Tennessee

University of Tennessee
View Profile

,
Xiaoye Sherry Li

Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory
View Profile

,
Jack Dongarra

University of Tennessee

University of Tennessee
View Profile

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisNovember 2022 Article No.: 26Pages 1–14

Published:18 November 2022Publication History

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Pages 1–14

ABSTRACT

Many scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are different in size. Matrix decompositions on such irregular workloads are rarely addressed on GPUs.

This paper addresses irregular workloads of matrix computations on GPUs, and their application to accelerate sparse direct solvers. We design an interface for the basic matrix operations supporting problems of different sizes. The interface enables us to develop irrLU-GPU, an LU decomposition on matrices of different sizes. We demonstrate the impact of irrLU-GPU on sparse direct LU solvers using NVIDIA and AMD GPUs. Experimental results are shown for a sparse direct solver based on a multifrontal sparse LU decomposition applied to linear systems arising from the simulation, using finite element discretization on unstructured meshes, of a high-frequency indefinite Maxwell problem.

Supplemental Material

SC22_Presentation_Ahmad.mp4

Presentation at SC '22

mp4

191.3 MB

Play stream Download

References

I. S. Duff and J. K. Reid, "The Multifrontal Solution of Indefinite Sparse Symmetric Linear Equations," ACM Trans. Math. Softw., vol. 9, no. 3, pp. 302--325, September 1983. [Online]. Google ScholarDigital Library
T. A. Davis, S. Rajamanickam, and W. M. Sid-Lakhdar, "A survey of direct methods for sparse linear systems," Acta Numer., vol. 25, pp. 383--âĂŞ566, 2016.Google ScholarCross Ref
"Intel Math Kernel Library," available at http://software.intel.com/intel-mkl/.Google Scholar
"NVIDIA CUBLAS," available at https://developer.nvidia.com/cublas.Google Scholar
"rocBLAS, Next Generation BLAS Implementation for ROCm Platform," available at https://github.com/ROCmSoftwarePlatform/rocBLAS.Google Scholar
H. Anzt, J. Dongarra, G. Flegar, and E. S. Quintana-Ortí, "Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors," Parallel Comput., vol. 81, pp. 131--146, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167819117302107Google ScholarCross Ref
E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov, "Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects," J. Phys.: Conf. Ser., vol. 180, p. 012037, July 2009. [Online]. Google ScholarCross Ref
X. S. Li and J. W. Demmel, "SuperLU_DIST: A Scalable Distributed-Memory Sparse Direct Solver for Unsymmetric Linear Systems," ACM Trans. Math. Softw., vol. 29, no. 2, pp. 110--âĂŞ140, June 2003. [Online]. Google ScholarDigital Library
T. A. Davis, "Algorithm 832: UMFPACK V4.3---an Unsymmetric-Pattern Multifrontal Method," ACM Trans. Math. Softw., vol. 30, no. 2, pp. 196--âĂŞ199, June 2004. [Online]. Google ScholarDigital Library
Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, "Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate," ACM Trans. Math. Softw., vol. 35, no. 3, October 2008. [Online]. Google ScholarDigital Library
P. Hénon, P. Ramet, and J. Roman, "PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems," Parallel Comput., vol. 28, no. 2, pp. 301--321, 2002. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167819101001417Google ScholarDigital Library
P. Ghysels, X. S. Li, F.-H. Rouet, S. Williams, and A. Napov, "An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling," SIAM J. Sci. Comput., vol. 38, no. 5, pp. S358--S384, 2016. [Online]. Google ScholarCross Ref
P. Ghysels and R. Synk, "High performance sparse multifrontal solvers on modern gpus," Parallel Comput., vol. 110, p. 102897, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167819122000059Google ScholarDigital Library
P. R. Amestoy and L. S. Duff, "Vectorization of a Multiprocessor Multifrontal Code," Int. J. High Perform. Comput. Appl., vol. 3, no. 3, pp. 41--59, September 1989. [Online]. Google ScholarDigital Library
A. Gupta, "WSMP: Watson Sparse Matrix Package Part II âĂŞ direct solution of general systems," IBM T. J. Watson Research Center, Tech. Rep., 2000, https://s3.us.cloud-object-storage.appdomain.cloud/res-files/1331-wsmp2.pdf.Google Scholar
A. A. Auer, G. Baumgartner, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, C.-C. Lam, Q. Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov, "Automatic code generation for many-body electronic structure methods: the tensor contraction engine," Mol. Phys., vol. 104, no. 2, pp. 211--228, 2006. [Online]. Google ScholarCross Ref
O. E. B. Messer, J. A. Harris, S. Parete-Koon, and M. A. Chertkow, "Multicore and Accelerator Development for a Leadership-Class Stellar Astrophysics Code," in Applied Parallel and Scientific Computing, P. Manninen and P. Öster, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 92--106.Google Scholar
A. Abdelfattah, T. Costa, J. Dongarra, M. Gates, A. Haidar, S. Hammarling, N. J. Higham, J. Kurzak, P. Luszczek, S. Tomov, and M. Zounon, "A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines," ACM Trans. Math. Softw., vol. 47, no. 3, June 2021. [Online]. Google ScholarDigital Library
K. Kim, T. B. Costa, M. Deveci, A. M. Bradley, S. D. Hammond, M. E. Guney, S. Knepper, S. Story, and S. Rajamanickam, "Designing Vector-friendly Compact BLAS and LAPACK Kernels," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '17. New York, NY, USA: ACM, 2017, pp. 55:1--55:12. [Online]. Google ScholarDigital Library
A. Abdelfattah, A. Haidar, S. Tomov, and J. Dongarra, "Performance, Design, and Autotuning of Batched GEMM for GPUs," in High Performance Computing, ser. ISC High Performance, J. M. Kunkel, P. Balaji, and J. Dongarra, Eds. Cham: Springer International Publishing, 2016, pp. 21--38. [Online]. Google ScholarCross Ref
G. Karypis and V. Kumar, "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs," SIAM J. Sci. Comput., vol. 20, no. 1, pp. 359--392, 1998. [Online]. Google ScholarDigital Library
I. S. Duff and J. Koster, "The design and use of algorithms for permuting large entries to the diagonal of sparse matrices," SIAM J. Matrix Anal. Appl., vol. 20, no. 4, pp. 889--901, 1999. [Online]. Google ScholarDigital Library
M. Gates, J. Kurzak, A. Charara, A. YarKhan, and J. Dongarra, "SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '19. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Google ScholarDigital Library
A. Abdelfattah, M. Baboulin, V. Dobrev, J. J. Dongarra, C. W. Earl, J. Falcou, A. Haidar, I. Karlin, T. V. Kolev, I. Masliah, and S. Tomov, "High-Performance Tensor Contractions for GPUs," in International Conference on Computational Science 2016, ICCS 2016, 6--8 June 2016, San Diego, California, USA, 2016, pp. 108--118. [Online]. Google ScholarDigital Library
S. N. Yeralan, T. A. Davis, W. M. Sid-Lakhdar, and S. Ranka, "Algorithm 980: Sparse QR Factorization on the GPU," ACM Trans. Math. Softw., vol. 44, no. 2, August 2017. [Online]. Google ScholarDigital Library
M. J. Anderson, D. Sheffield, and K. Keutzer, "A Predictive Model for Solving Small Linear Algebra Problems in GPU Registers," in 2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012, pp. 2--13. [Online]. Google ScholarDigital Library
T. Dong, V. Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra, "A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU," in 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014, pp. 972--981. [Online]. Google ScholarDigital Library
A. Haidar, T. Dong, P. Luszczek, S. Tomov, and J. J. Dongarra, "Batched matrix computations on hardware accelerators based on GPUs," Int. J. High Perform. Comput. Appl., vol. 29, no. 2, pp. 193--208, 2015. [Online]. Google ScholarDigital Library
A. Abdelfattah, A. Haidar, S. Tomov, and J. Dongarra, "Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures," J. Comput. Sci., vol. 26, pp. 226--236, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877750317311456Google ScholarCross Ref
H. Anzt, J. Dongarra, G. Flegar, and E. S. Quintana-Ortí, "Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs," in Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, ser. PMAM'17. New York, NY, USA: Association for Computing Machinery, 2017, pp. 1âĂŞ-10. [Online]. Google ScholarDigital Library
A. Charara, H. Ltaief, and D. Keyes, "Redesigning Triangular Dense Matrix Computations on GPUs," in Proceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833. Berlin, Heidelberg: Springer-Verlag, 2016, pp. 477--489. [Online]. Google ScholarDigital Library
R. Anderson, J. Andrej, A. Barker, J. Bramwell, J.-S. Camier, J. Cerveny, V. Dobrev, Y. Dudouit, A. Fisher, T. Kolev, W. Pazner, M. Stowell, V. Tomov, I. Akkerman, J. Dahm, D. Medina, and S. Zampini, "MFEM: A modular finite element methods library," Comput. Math. with Appl., vol. 81, pp. 42--74, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0898122120302583Google ScholarCross Ref

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2022
1277 pages
ISBN:9784665454445
Conference Chairs:
Felix Wolf,
Sameer Shende,
General Chair:
Candace Culhane,
Program Chairs:
Sadaf Alam,
Heike Jagode
Sponsors
In-Cooperation
Publisher
IEEE Press
Publication History
- Published: 18 November 2022
Check for updates
Badges
Author Tags
multifrontal solvers
irregular computational workloads
sparse direct solvers
LU factorization
GPU computing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%
Upcoming Conference
SC '23

Sponsor:

sighpc

SC '23: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2023

Denver , CO , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 36
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers

Save to Binder

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

ABSTRACT