research-article

Efficient exascale discretizations: High-order finite element methods

Authors Info & Claims
Published:01 November 2021Publication History
Skip Abstract Section

Abstract

Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on unstructured grids is to use matrix-free/partially assembled high-order finite element methods, since these methods can increase the accuracy and/or lower the computational time due to reduced data motion. In this paper we provide an overview of the research and development activities in the Center for Efficient Exascale Discretizations (CEED), a co-design center in the Exascale Computing Project that is focused on the development of next-generation discretization software and algorithms to enable a wide range of finite element applications to run efficiently on future hardware. CEED is a research partnership involving more than 30 computational scientists from two US national labs and five universities, including members of the Nek5000, MFEM, MAGMA and PETSc projects. We discuss the CEED co-design activities based on targeted benchmarks, miniapps and discretization libraries and our work on performance optimizations for large-scale GPU architectures. We also provide a broad overview of research and development activities in areas such as unstructured adaptive mesh refinement algorithms, matrix-free linear solvers, high-order data visualization, and list examples of collaborations with several ECP and external applications.

References

  1. Abdelfattah ABaboulin MDobrev V, et al. (2016a) High-performance tensor contractions for GPUs. In: International Conference on Computational Science 2016, ICCS 2016, 6–8 June 2016, San Diego, California, USA, pp. 108118.Google ScholarGoogle Scholar
  2. Abdelfattah ABarra VBeams N, et al. (2020) libCEED User Manual. DOI:10.5281/zenodo.4302737.Google ScholarGoogle Scholar
  3. Abdelfattah AHaidar ATomov S, et al. (2016b) Performance, design, and autotuning of batched GEMM for GPUs. In: High Performance Computing—31st International Conference, ISC High Performance 2016, Frankfurt, Germany, 19–23 June 2016, pp. 2138.Google ScholarGoogle Scholar
  4. Ameen MPatel SColmenares J, et al. (2020) Direct Numerical Simulation (DNS) and high-fidelity large-eddy simulations for improved prediction of in-cylinder flow and combustion processes. Technical report, DOE Vehicle Technologies Office Annual Merit Review.Google ScholarGoogle Scholar
  5. Anderson RAndrej JBarker A, et al. (2020) MFEM: a modular finite element library. Computers & Mathematics with Applications 81: 4274.Google ScholarGoogle Scholar
  6. Anderson RWDobrev VAKolev TV, et al. (2015) Monotonicity in high-order curvilinear finite element arbitrary Lagrangian–Eulerian remap. International Journal for Numerical Methods in Engineering 77(5): 249273.Google ScholarGoogle Scholar
  7. Anderson RWDobrev VAKolev TV, et al. (2017) High-order local maximum principle preserving (MPP) discontinuous Galerkin finite element method for the transport equation. Journal of Computational Physics 334: 102124.Google ScholarGoogle Scholar
  8. Anderson RWDobrev VAKolev TV, et al. (2018) High-order multi-material ALE hydrodynamics. SIAM Journal on Scientific Computing 40(1): B32B58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Antonietti PFSarti MVerani M, et al. (2016) A uniform additive Schwarz preconditioner for high-order discontinuous Galerkin approximations of elliptic problems. Journal of Scientific Computing 70(2): 608630.Google ScholarGoogle Scholar
  10. Balay SAbhyankar SAdams MF, et al. (2019) PETSc Web page. Available at: https://www.mcs.anl.gov/petsc (accessed 26 May 2021).Google ScholarGoogle Scholar
  11. Barra VBrown JThompson J, et al. (2020) High-performance operator evaluations with ease of use: libCEED’s Python interface. In: M AgarwalC CallowayD NiederhutD Shupe (eds) Proceedings of the 19th Python in Science Conference, Austin, Texas, 6–12 July 2020, pp. 8590.Google ScholarGoogle Scholar
  12. Beckingsale DABurmark JHornung R, et al. (2019) RAJA: portable performance for large-scale scientific applications. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), GA, USA, 13 November 2020.Google ScholarGoogle Scholar
  13. Bello-Maldonado PDFischer PF (2019) Scalable low-order finite element preconditioners for high-order spectral element Poisson solvers. SIAM Journal on Scientific Computing 41(5): S2S18.Google ScholarGoogle Scholar
  14. Bonoli P (2020) Center for integrated simulation of fusion relevant RF actuators. Available at: https://www.rfscidac4.org/home (accessed 26 May 2021).Google ScholarGoogle Scholar
  15. Brown CAbdelfattah ATomov S, et al. (2020a) Design, optimization, and benchmarking of dense linear algebra algorithms on AMD GPUs. Technical Report ICL-UT-20-12, University of Tennessee, USA.Google ScholarGoogle Scholar
  16. Brown CAbdelfattah ATomov S, et al. (2020b) hipMAGMA v2.0.0. DOI: 10.5281/zenodo.3928667.Google ScholarGoogle Scholar
  17. Brown JDobrev VDutta S, et al. (2018) Propose high-order mesh/data format. Technical Report CEED-MS18, Exascale Computing Project. DOI: 10.5281/zenodo.2542346.Google ScholarGoogle Scholar
  18. Brown JDobrev VFischer P, et al. (2017) Initial Integration of CEED Software in ECP/CEED Applications. Technical Report CEED-MS8, Exascale Computing Project. DOI: 10.5281/zenodo.2542338.Google ScholarGoogle Scholar
  19. Brown JHe YMacLachlan S (2019) Local Fourier analysis of BDDC-like algorithms. SIAM Journal on Scientific Computing 41: S346S369.Google ScholarGoogle Scholar
  20. Brubeck PFischer P (2019) Fast diagonalization preconditioning for nonsymmetric spectral element problems. ANL/MCS-P9200-0719.Google ScholarGoogle Scholar
  21. Brubeck PKaneko KLan Y, et al. (2020) Schwarz preconditioned spectral element methods for steady flow and heat transfer. ANL/MCS-P9199-0719.Google ScholarGoogle Scholar
  22. Canuto C (1994) Stabilization of spectral methods by finite element bubble functions. Computer Methods in Applied Mechanics and Engineering 116(1–4): 1326.Google ScholarGoogle Scholar
  23. Canuto CHussaini MYQuarteroni A, et al. (2006) Spectral Methods: Fundamentals in Single Domains. Berlin Heidelberg: Springer.Google ScholarGoogle Scholar
  24. Cerveny JDobrev VKolev T (2019) Non-conforming mesh refinement for high-order finite elements. SIAM Journal on Scientific Computing 41(4): C367C392.Google ScholarGoogle ScholarCross RefCross Ref
  25. Chalmers NWarburton T (2018) Low-order preconditioning of high-order triangular finite elements. SIAM Journal on Scientific Computing 40(6): A4040A4059.Google ScholarGoogle Scholar
  26. Chalmers NKarakus AAustin AP, et al. (2020) libParanumal: a performance portable high-order finite element library [Software]. Available at: https://github.com/paranumal/libparanumal. Release 0.3.1 (accessed 26 May 2021).Google ScholarGoogle Scholar
  27. Churchfield MLee SMoriatry P (2000) Adding complex terrain and stable atmospheric condition capability to the OpenFOAM-based flow solver of the simulator for on/offshore wind farm application (SOWFA). Technical Report NREL/CP-5000-58539, NREL.Google ScholarGoogle Scholar
  28. Deville MFischer PMund E (2002) High-Order Methods for Incompressible Fluid Flow. Cambridge: Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  29. Dobrev VKnupp PKolev T, et al. (2019) The target-matrix optimization paradigm for high-order meshes. SIAM Journal on Scientific Computing 41(1): B50B68.Google ScholarGoogle Scholar
  30. Dobrev VAKnupp PKolev TV, et al. (2020) Simulation-driven optimization of high-order meshes in ALE hydrodynamics. Computers & Fluid 208: 104602. DOI: 10.1016/j.compfluid.2020.104602Google ScholarGoogle Scholar
  31. Dobrev VAKolev TVRieben RN (2012) High-order curvilinear finite element methods for Lagrangian hydrodynamics. SIAM Journal on Scientific Computing 34(5): B606B641.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dobrev VAKolev TVRieben RN, et al. (2016) Multi-material closure model for high-order finite element Lagrangian hydrodynamics. International Journal for Numerical Methods in Engineering 82(10): 689706.Google ScholarGoogle Scholar
  33. Dobrev VALazarov RDVassilevski PS, et al. (2006) Two-level preconditioning of discontinuous Galerkin approximations of second-order elliptic equations. Numerical Linear Algebra with Applications 13(9): 753770.Google ScholarGoogle ScholarCross RefCross Ref
  34. Dohrmann C (2003) A preconditioner for substructuring based on constrained energy minimization. SIAM Journal on Scientific Computing 25: 246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dongarra JDuff IGates M, et al. (2016) A proposed API for Batched Basic Linear Algebra Subprograms. MIMS EPrint 2016.25, Manchester Institute for Mathematical Sciences, The University of Manchester. Available at: http://eprints.ma.man.ac.uk/2464/ (accessed 26 May 2021).Google ScholarGoogle Scholar
  36. Dongarra JDuff IGates M, et al. (2018) Batched BLAS (basic linear algebra subprograms) 2018 specification. ICL-UTK technical report. Available at: https://www.icl.utk.edu/files/publications/2018/icl-utk-1170-2018.pdf (accessed 26 May 2021).Google ScholarGoogle Scholar
  37. Dutta SFischer PShyuan C, et al. (2020) On turbulence and particle transport in closed rooms. American Physical Society, Division of Fluid Dynamics submitted for publication. Under review.Google ScholarGoogle Scholar
  38. Farin G (2014) Curves and Surfaces for Computer-Aided Geometric Design: A Practical Guide. Amsterdam: Elsevier.Google ScholarGoogle Scholar
  39. Feuillet RLoseille AMarcum D, et al. (2018) Connectivity-change moving mesh methods for high-order meshes: toward closed advancing-layer high-order boundary layer mesh generation. In: 2018 Fluid Dynamics Conference, Atlanta, Georgia, 25-29 June 2018. pp. 4167. DOI: 10.2514/6.2018-4167.Google ScholarGoogle Scholar
  40. Fischer PHeisey KMin M (2015) Scaling limits for PDE-based simulation (invited). In: 22nd AIAA Computational Fluid Dynamics Conference, AIAA Aviation. Dallas, TX, 22-26 June 2015, AIAA, pp. 20153049.Google ScholarGoogle Scholar
  41. Fischer PMin MRathnayake T, et al. (2020) Scalability of high-performance PDE solvers. The International Journal of High Performance Computing Applications 34(5): 562586.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. FMS (2020) FMS: High-order field and mesh specification [Software]. Available at: https://github.com/CEED/FMS (accessed 26 May 2021).Google ScholarGoogle Scholar
  43. Geuzaine CRemacle JF (2013) Gmsh: a three-dimensional finite element mesh generator with built-in pre- and post-processing facilities [Software]. Available at: http://gmsh.info/ (accessed 26 May 2021).Google ScholarGoogle Scholar
  44. Giannakopoulos GFrouzakis CFischer P, et al. (2019) LES of the gas-exchange process inside an internal combustion engine using a high-order method. Flow, Turbulence and Combustion 104: 673692.Google ScholarGoogle Scholar
  45. GLVis (2020) GLVis: OpenGL finite element visualization tool [Software]. Available at: https://glvis.org (accessed 26 May 2021).Google ScholarGoogle Scholar
  46. Hajduk HKuzmin DKolev TV, et al. (2020a) Matrix-free subcell residual distribution for Bernstein finite element discretizations of linear advection equations. Computer Methods in Applied Mechanics and Engineering 359: 112658.Google ScholarGoogle Scholar
  47. Hajduk HKuzmin DKolev TV, et al. (2020b) Matrix-free subcell residual distribution for Bernstein finite elements: monolithic limiting. Computers & Fluids 200: 104451.Google ScholarGoogle Scholar
  48. Ibanez D (2016a) Omega_h GitHub repository [Software]. Available at: https://github.com/ibaned/omega_h (accessed 26 May 2021).Google ScholarGoogle Scholar
  49. Ibanez DSeol ESmith C, et al. (2016) Pumi: parallel unstructured mesh infrastructure. ACM Transactions on Mathematical Software (TOMS) 42(3): 17.Google ScholarGoogle Scholar
  50. Ibanez DA (2016b) Conformal Mesh Adaptation on Heterogeneous Supercomputers. Troy, NY: Rensselaer Polytechnic Institute.Google ScholarGoogle Scholar
  51. Karakus AChalmers NHesthaven JS, et al. (2019a) Discontinuous Galerkin discretizations of the Boltzmann–BGK equations for nearly incompressible flows: semi-analytic time stepping and absorbing boundary layers. Journal of Computational Physics 390: 175202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Karakus AChalmers NSwirydowicz K, et al. (2019b) A gpu accelerated discontinuous Galerkin incompressible flow solver. Journal of Computational Physics 390: 380404.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Knoll DKeyes D (2004) Jacobian-free Newton-Krylov methods: a survey of approaches and applications. Journal of Computational Physics 193: 357397.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Kolev TFischer PAbdelfattah A, et al. (2020) Improve performance and capabilities of CEED-enabled ECP applications on Summit/Sierra. Technical Report CEED-MS34, Exascale Computing Project. DOI: 10.5281/zenodo.3860804.Google ScholarGoogle Scholar
  55. Kolev TVVassilevski PS (2009) Parallel auxiliary space AMG for h ( curl ) problems. Journal of Computational Mathematics 27(5): 604623.Google ScholarGoogle ScholarCross RefCross Ref
  56. Kronbichler MLjungkvist K (2019) Multigrid for matrix-free high-order finite element computations on graphics processors. ACM Transactions on Parallel Computing 6(1): 132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Laghos (2020) Laghos: High-order Lagrangian hydrodynamics miniapp [Software]. Available at: https://github.com/ceed/Laghos (accessed 26 May 2021).Google ScholarGoogle Scholar
  58. Lottes JWFischer PF (2005) Hybrid multigrid/Schwarz algorithms for the spectral element method. Journal of Scientific Computing 24(1): 4578.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Lu QShephard MSTendulkar S, et al. (2014) Parallel mesh adaptation for high-order finite element methods with curved element geometry. Engineering with Computers 30(2): 271286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Luo XJShephard MSO’bara RM, et al. (2004) Automatic p-version mesh generation for curved domains. Engineering with Computers 20(3): 273285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. MAGMA (2020) MAGMA: Matrix algebra on gpu and multicore architectures [Software]. Available at: https://icl.utk.edu/magma (accessed 26 May 2021).Google ScholarGoogle Scholar
  62. Martinez JLan YHMerzari E, et al. (2019) On the use of LES-based turbulent thermal-stress models for rod bundle simulations. International Journal of Heat and Mass Transfer 142: 118399.Google ScholarGoogle Scholar
  63. Masliah IAbdelfattah AHaidar A, et al. (2016) High-Performance Matrix-Matrix Multiplications of Very Small Matrices. In: Euro-Par 2016: Parallel Processing—22nd International Conference on Parallel and Distributed Computing, Grenoble, France, 24–26 August 2016, pp. 659671.Google ScholarGoogle Scholar
  64. Medina DSSt-Cyr AWarburton T (2014) OCCA: a unified approach to multi-threading languages. arXiv preprint arXiv:1403.0968.Google ScholarGoogle Scholar
  65. Merzari ERahaman RPatel S, et al. (2017) Cfd smr assembly performance baselines with nek5000. Technical Report ECP-SE-08-47, DOE ECP ExaSMR Milestone Report.Google ScholarGoogle Scholar
  66. MFEM (2020) MFEM: Modular finite element methods [Software]. Available at: https://mfem.org (accessed 26 May 2021).Google ScholarGoogle Scholar
  67. Min MCamier JSFischer P, et al. (2019a) Engage second wave ECP/CEED applications. Technical Report CEED-MS23, Exascale Computing Project. DOI: 10.5281/zenodo.2542359.Google ScholarGoogle Scholar
  68. Min MFischer PTomov V, et al. (2017) Engage first wave ECP/CEED applications. Technical Report CEED-MS1, Exascale Computing Project. DOI: 10.5281/zenodo.2542292.Google ScholarGoogle Scholar
  69. Min MTomboulides AFischer P, et al. (2019b) Nek5000 enhancements for faster running analysis. Technical Report ANL.MCS-TM-384, ANL NEAMS Report.Google ScholarGoogle Scholar
  70. Mittal KDutta SFischer P (2019) Nonconforming Schwarz-spectral element methods for incompressible flow. Computers and Fluids 191: 104237.Google ScholarGoogle Scholar
  71. Nek5000 (2020) Nek: Open source, highly scalable and portable spectral element code [Software]. Available at: https://nek5000.mcs.anl.gov (accessed 26 May 2021).Google ScholarGoogle Scholar
  72. OCCA (2020) OCCA: lightweight performance portability library [Software]. Available at: https://libocca.org/ (accessed 26 May 2021).Google ScholarGoogle Scholar
  73. Orszag S (1980) Spectral methods for problems in complex geometry. Journal of Computational Physics 37: 7092.Google ScholarGoogle ScholarCross RefCross Ref
  74. Otten MGong JMametjanov A, et al. (2016) An MPI/OpenACC implementation of a high order electromagnetics solver with GPUDirect communication. The International Journal of High Performance Computing Applications 30: 320334.Google ScholarGoogle ScholarCross RefCross Ref
  75. Patel SFischer PMin M, et al. (2019) A characteristic-based, spectral element method for moving-domain problems. Journal of Scientific Computing 79: 564592.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Patera A (1984) A spectral element method for fluid dynamics: laminar flow in a channel expansion. Journal of Computational Physics 54: 468488.Google ScholarGoogle ScholarCross RefCross Ref
  77. Pavarino LWidlund OZampini S (2010) BDDC preconditioners for spectral element discretizations of almost incompressible elasticity in three dimensions. SIAM Journal on Scientific Computing 32: 3604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Pazner W (2020) Efficient low-order refined preconditioners for high-order matrix-free continuous and discontinuous Galerkin methods. SIAM Journal on Scientific Computing 42(5): A3055A3083.Google ScholarGoogle ScholarCross RefCross Ref
  79. Pazner WPersson PO (2018) Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods. Journal of Computational Physics 354: 344369.Google ScholarGoogle ScholarCross RefCross Ref
  80. Raffenetti KAmer AOden L, et al. (2017) Why is MPI so Slow? Analyzing the Fundamental Limits in Implementing MPI-3.1. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Denver, Colorado, New York, NY, USA: Association for Computing Machinery. DOI: 10.1145/3126908.3126963.Google ScholarGoogle Scholar
  81. RAJA (2020) RAJA performance portability layer [Software]. Available at: https://github.com/LLNL/RAJA (accessed 26 May 2021).Google ScholarGoogle Scholar
  82. Remhos (2020) Remhos: High-order remap miniapp [Software]. Available at: https://github.com/ceed/Remhos (accessed 26 May 2021).Google ScholarGoogle Scholar
  83. Shephard MBarra VBrown J, et al. (2019) Improved Support for Parallel Adaptive Simulation in CEED. Technical Report CEED-MS29, Exascale Computing Project. DOI: 10.5281/zenodo.3336420.Google ScholarGoogle Scholar
  84. Shiraiwa SWright JBonoli P, et al. (2017) Rf wave simulation for cold edge plasmas using the mfem library. In: EPJ Web of Conferences, Vol.157. Les Ulis: EDP Sciences, p. 03048.Google ScholarGoogle Scholar
  85. Simmetrix (2020) Simmetrix: Enabling simulation-based design. Available at: http://www.simmetrix.com/ (accessed 26 May 2021).Google ScholarGoogle Scholar
  86. Sundar HStadler GBiros G (2015) Comparison of multigrid algorithms for high-order continuous finite element discretizations. Numerical Linear Algebra with Applications 22(4): 664680.Google ScholarGoogle ScholarCross RefCross Ref
  87. Swirydowicz KChalmers NKarakus A, et al. (2019) Acceleration of tensor-product operations for high-order finite element methods. The International Journal of High Performance Computing Applications 33(4): 735757.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Tomboulides AAithal MFischer P, et al. (2018) A novel numerical treatment of the near-wall regions in the k- ω class of the rans models. International Journal of Heat and Fluid Flow 72: 186199.Google ScholarGoogle Scholar
  89. Tomov SAbdelfattah ABarra V, et al. (2019) Performance tuning of CEED software and 1st and 2nd wave apps. Technical Report CEED-MS32, Exascale Computing Project. DOI:10.5281/zenodo.3477618.Google ScholarGoogle Scholar
  90. Tomov SBello-Maldonado PBrown J, et al. (2018) Performance tuning of CEED software and first wave apps. Technical Report CEED-MS20, Exascale Computing Project. DOI:10.5281/zenodo.2542350.Google ScholarGoogle Scholar
  91. Tsai PHLan YHFisher P, et al. (2020) Drift-diffusion solvers. Part II: STEADY PROBLEMS. ANL/MCS-P9295-0420.Google ScholarGoogle Scholar
  92. VisIt (2020) VisIt: a distributed, parallel visualization and analysis tool [Software]. Available at: https://visit.llnl.gov. DOI: 10.11578/dc.20171025.on.1019 (accessed 26 May 2021).Google ScholarGoogle Scholar
  93. Zampini S (2016) PCBDDC: a class of robust dual-primal methods in PETSc. SIAM Journal on Scientific Computing 38(5): S282S306.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

(auto-classified)
  1. Efficient exascale discretizations: High-order finite element methods

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!