Abstract
Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on unstructured grids is to use matrix-free/partially assembled high-order finite element methods, since these methods can increase the accuracy and/or lower the computational time due to reduced data motion. In this paper we provide an overview of the research and development activities in the Center for Efficient Exascale Discretizations (CEED), a co-design center in the Exascale Computing Project that is focused on the development of next-generation discretization software and algorithms to enable a wide range of finite element applications to run efficiently on future hardware. CEED is a research partnership involving more than 30 computational scientists from two US national labs and five universities, including members of the Nek5000, MFEM, MAGMA and PETSc projects. We discuss the CEED co-design activities based on targeted benchmarks, miniapps and discretization libraries and our work on performance optimizations for large-scale GPU architectures. We also provide a broad overview of research and development activities in areas such as unstructured adaptive mesh refinement algorithms, matrix-free linear solvers, high-order data visualization, and list examples of collaborations with several ECP and external applications.
- 2016a)
High-performance tensor contractions for GPUs . In: International Conference on Computational Science 2016, ICCS 2016, 6–8 June 2016, San Diego, California, USA, pp. 108–118.Google Scholar , ( - 2020) libCEED User Manual. DOI:10.5281/zenodo.4302737.Google Scholar , (
- 2016b)
Performance, design, and autotuning of batched GEMM for GPUs . In: High Performance Computing—31st International Conference, ISC High Performance 2016, Frankfurt, Germany, 19–23 June 2016, pp. 21–38.Google Scholar , ( - 2020) Direct Numerical Simulation (DNS) and high-fidelity large-eddy simulations for improved prediction of in-cylinder flow and combustion processes. Technical report, DOE Vehicle Technologies Office Annual Merit Review.Google Scholar , (
- 2020) MFEM: a modular finite element library. Computers & Mathematics with Applications 81: 42–74.Google Scholar , (
- 2015) Monotonicity in high-order curvilinear finite element arbitrary Lagrangian–Eulerian remap. International Journal for Numerical Methods in Engineering 77(5): 249–273.Google Scholar , (
- 2017) High-order local maximum principle preserving (MPP) discontinuous Galerkin finite element method for the transport equation. Journal of Computational Physics 334: 102–124.Google Scholar , (
- 2018) High-order multi-material ALE hydrodynamics. SIAM Journal on Scientific Computing 40(1): B32–B58.Google ScholarDigital Library , (
- 2016) A uniform additive Schwarz preconditioner for high-order discontinuous Galerkin approximations of elliptic problems. Journal of Scientific Computing 70(2): 608–630.Google Scholar , (
- 2019) PETSc Web page. Available at: https://www.mcs.anl.gov/petsc (accessed 26 May 2021).Google Scholar , (
- 2020)
High-performance operator evaluations with ease of use: libCEED’s Python interface . In: (eds) Proceedings of the 19th Python in Science Conference, Austin, Texas, 6–12 July 2020, pp. 85–90.Google Scholar , ( - 2019)
RAJA: portable performance for large-scale scientific applications . In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), GA, USA, 13 November 2020.Google Scholar , ( - 2019) Scalable low-order finite element preconditioners for high-order spectral element Poisson solvers. SIAM Journal on Scientific Computing 41(5): S2–S18.Google Scholar (
- 2020) Center for integrated simulation of fusion relevant RF actuators. Available at: https://www.rfscidac4.org/home (accessed 26 May 2021).Google Scholar (
- 2020a) Design, optimization, and benchmarking of dense linear algebra algorithms on AMD GPUs. Technical Report ICL-UT-20-12, University of Tennessee, USA.Google Scholar , (
- 2020b) hipMAGMA v2.0.0. DOI: 10.5281/zenodo.3928667.Google Scholar , (
- 2018) Propose high-order mesh/data format. Technical Report CEED-MS18, Exascale Computing Project. DOI: 10.5281/zenodo.2542346.Google Scholar , (
- 2017) Initial Integration of CEED Software in ECP/CEED Applications. Technical Report CEED-MS8, Exascale Computing Project. DOI: 10.5281/zenodo.2542338.Google Scholar , (
- 2019) Local Fourier analysis of BDDC-like algorithms. SIAM Journal on Scientific Computing 41: S346–S369.Google Scholar (
- 2019) Fast diagonalization preconditioning for nonsymmetric spectral element problems. ANL/MCS-P9200-0719.Google Scholar (
- 2020) Schwarz preconditioned spectral element methods for steady flow and heat transfer. ANL/MCS-P9199-0719.Google Scholar , (
- 1994) Stabilization of spectral methods by finite element bubble functions. Computer Methods in Applied Mechanics and Engineering 116(1–4): 13–26.Google Scholar (
- 2006) Spectral Methods: Fundamentals in Single Domains. Berlin Heidelberg: Springer.Google Scholar , (
- 2019) Non-conforming mesh refinement for high-order finite elements. SIAM Journal on Scientific Computing 41(4): C367–C392.Google ScholarCross Ref (
- 2018) Low-order preconditioning of high-order triangular finite elements. SIAM Journal on Scientific Computing 40(6): A4040–A4059.Google Scholar (
- 2020) libParanumal: a performance portable high-order finite element library [Software]. Available at: https://github.com/paranumal/libparanumal. Release 0.3.1 (accessed 26 May 2021).Google Scholar , (
- 2000) Adding complex terrain and stable atmospheric condition capability to the OpenFOAM-based flow solver of the simulator for on/offshore wind farm application (SOWFA). Technical Report NREL/CP-5000-58539, NREL.Google Scholar (
- 2002) High-Order Methods for Incompressible Fluid Flow. Cambridge: Cambridge University Press.Google ScholarCross Ref (
- 2019) The target-matrix optimization paradigm for high-order meshes. SIAM Journal on Scientific Computing 41(1): B50–B68.Google Scholar , (
- 2020) Simulation-driven optimization of high-order meshes in ALE hydrodynamics. Computers & Fluid 208: 104602. DOI: 10.1016/j.compfluid.2020.104602Google Scholar , (
- 2012) High-order curvilinear finite element methods for Lagrangian hydrodynamics. SIAM Journal on Scientific Computing 34(5): B606–B641.Google ScholarDigital Library (
- 2016) Multi-material closure model for high-order finite element Lagrangian hydrodynamics. International Journal for Numerical Methods in Engineering 82(10): 689–706.Google Scholar , (
- 2006) Two-level preconditioning of discontinuous Galerkin approximations of second-order elliptic equations. Numerical Linear Algebra with Applications 13(9): 753–770.Google ScholarCross Ref , (
- 2003) A preconditioner for substructuring based on constrained energy minimization. SIAM Journal on Scientific Computing 25: 246.Google ScholarDigital Library (
- 2016) A proposed API for Batched Basic Linear Algebra Subprograms. MIMS EPrint 2016.25, Manchester Institute for Mathematical Sciences, The University of Manchester. Available at: http://eprints.ma.man.ac.uk/2464/ (accessed 26 May 2021).Google Scholar , (
- 2018) Batched BLAS (basic linear algebra subprograms) 2018 specification. ICL-UTK technical report. Available at: https://www.icl.utk.edu/files/publications/2018/icl-utk-1170-2018.pdf (accessed 26 May 2021).Google Scholar , (
- 2020) On turbulence and particle transport in closed rooms. American Physical Society, Division of Fluid Dynamics submitted for publication. Under review.Google Scholar , (
- 2014) Curves and Surfaces for Computer-Aided Geometric Design: A Practical Guide. Amsterdam: Elsevier.Google Scholar (
- 2018)
Connectivity-change moving mesh methods for high-order meshes: toward closed advancing-layer high-order boundary layer mesh generation . In: 2018 Fluid Dynamics Conference, Atlanta, Georgia, 25-29 June 2018. pp. 4167. DOI: 10.2514/6.2018-4167.Google Scholar , ( - 2015)
Scaling limits for PDE-based simulation (invited) . In: 22nd AIAA Computational Fluid Dynamics Conference, AIAA Aviation. Dallas, TX, 22-26 June 2015, AIAA, pp. 2015–3049.Google Scholar ( - 2020) Scalability of high-performance PDE solvers. The International Journal of High Performance Computing Applications 34(5): 562–586.Google ScholarDigital Library , (
- FMS (2020) FMS: High-order field and mesh specification [Software]. Available at: https://github.com/CEED/FMS (accessed 26 May 2021).Google Scholar
- 2013) Gmsh: a three-dimensional finite element mesh generator with built-in pre- and post-processing facilities [Software]. Available at: http://gmsh.info/ (accessed 26 May 2021).Google Scholar (
- 2019) LES of the gas-exchange process inside an internal combustion engine using a high-order method. Flow, Turbulence and Combustion 104: 673–692.Google Scholar , (
- GLVis (2020) GLVis: OpenGL finite element visualization tool [Software]. Available at: https://glvis.org (accessed 26 May 2021).Google Scholar
- 2020a) Matrix-free subcell residual distribution for Bernstein finite element discretizations of linear advection equations. Computer Methods in Applied Mechanics and Engineering 359: 112658.Google Scholar , (
- 2020b) Matrix-free subcell residual distribution for Bernstein finite elements: monolithic limiting. Computers & Fluids 200: 104451.Google Scholar , (
- 2016a) Omega_h GitHub repository [Software]. Available at: https://github.com/ibaned/omega_h (accessed 26 May 2021).Google Scholar (
- 2016) Pumi: parallel unstructured mesh infrastructure. ACM Transactions on Mathematical Software (TOMS) 42(3): 17.Google Scholar , (
- 2016b) Conformal Mesh Adaptation on Heterogeneous Supercomputers. Troy, NY: Rensselaer Polytechnic Institute.Google Scholar (
- 2019a) Discontinuous Galerkin discretizations of the Boltzmann–BGK equations for nearly incompressible flows: semi-analytic time stepping and absorbing boundary layers. Journal of Computational Physics 390: 175–202.Google ScholarDigital Library , (
- 2019b) A gpu accelerated discontinuous Galerkin incompressible flow solver. Journal of Computational Physics 390: 380–404.Google ScholarDigital Library , (
- 2004) Jacobian-free Newton-Krylov methods: a survey of approaches and applications. Journal of Computational Physics 193: 357–397.Google ScholarDigital Library (
- 2020) Improve performance and capabilities of CEED-enabled ECP applications on Summit/Sierra. Technical Report CEED-MS34, Exascale Computing Project. DOI: 10.5281/zenodo.3860804.Google Scholar , (
- 2009) Parallel auxiliary space AMG for h ( curl ) problems. Journal of Computational Mathematics 27(5): 604–623.Google ScholarCross Ref (
- 2019) Multigrid for matrix-free high-order finite element computations on graphics processors. ACM Transactions on Parallel Computing 6(1): 1–32.Google ScholarDigital Library (
- Laghos (2020) Laghos: High-order Lagrangian hydrodynamics miniapp [Software]. Available at: https://github.com/ceed/Laghos (accessed 26 May 2021).Google Scholar
- 2005) Hybrid multigrid/Schwarz algorithms for the spectral element method. Journal of Scientific Computing 24(1): 45–78.Google ScholarDigital Library (
- 2014) Parallel mesh adaptation for high-order finite element methods with curved element geometry. Engineering with Computers 30(2): 271–286.Google ScholarDigital Library , (
- 2004) Automatic p-version mesh generation for curved domains. Engineering with Computers 20(3): 273–285.Google ScholarDigital Library , (
- MAGMA (2020) MAGMA: Matrix algebra on gpu and multicore architectures [Software]. Available at: https://icl.utk.edu/magma (accessed 26 May 2021).Google Scholar
- 2019) On the use of LES-based turbulent thermal-stress models for rod bundle simulations. International Journal of Heat and Mass Transfer 142: 118399.Google Scholar , (
- 2016)
High-Performance Matrix-Matrix Multiplications of Very Small Matrices . In: Euro-Par 2016: Parallel Processing—22nd International Conference on Parallel and Distributed Computing, Grenoble, France, 24–26 August 2016, pp. 659–671.Google Scholar , ( - 2014) OCCA: a unified approach to multi-threading languages. arXiv preprint arXiv:1403.0968.Google Scholar (
- 2017) Cfd smr assembly performance baselines with nek5000. Technical Report ECP-SE-08-47, DOE ECP ExaSMR Milestone Report.Google Scholar , (
- MFEM (2020) MFEM: Modular finite element methods [Software]. Available at: https://mfem.org (accessed 26 May 2021).Google Scholar
- 2019a) Engage second wave ECP/CEED applications. Technical Report CEED-MS23, Exascale Computing Project. DOI: 10.5281/zenodo.2542359.Google Scholar , (
- 2017) Engage first wave ECP/CEED applications. Technical Report CEED-MS1, Exascale Computing Project. DOI: 10.5281/zenodo.2542292.Google Scholar , (
- 2019b) Nek5000 enhancements for faster running analysis. Technical Report ANL.MCS-TM-384, ANL NEAMS Report.Google Scholar , (
- 2019) Nonconforming Schwarz-spectral element methods for incompressible flow. Computers and Fluids 191: 104237.Google Scholar (
- Nek5000 (2020) Nek: Open source, highly scalable and portable spectral element code [Software]. Available at: https://nek5000.mcs.anl.gov (accessed 26 May 2021).Google Scholar
- OCCA (2020) OCCA: lightweight performance portability library [Software]. Available at: https://libocca.org/ (accessed 26 May 2021).Google Scholar
- 1980) Spectral methods for problems in complex geometry. Journal of Computational Physics 37: 70–92.Google ScholarCross Ref (
- 2016) An MPI/OpenACC implementation of a high order electromagnetics solver with GPUDirect communication. The International Journal of High Performance Computing Applications 30: 320–334.Google ScholarCross Ref , (
- 2019) A characteristic-based, spectral element method for moving-domain problems. Journal of Scientific Computing 79: 564–592.Google ScholarDigital Library , (
- 1984) A spectral element method for fluid dynamics: laminar flow in a channel expansion. Journal of Computational Physics 54: 468–488.Google ScholarCross Ref (
- 2010) BDDC preconditioners for spectral element discretizations of almost incompressible elasticity in three dimensions. SIAM Journal on Scientific Computing 32: 3604.Google ScholarDigital Library (
- 2020) Efficient low-order refined preconditioners for high-order matrix-free continuous and discontinuous Galerkin methods. SIAM Journal on Scientific Computing 42(5): A3055–A3083.Google ScholarCross Ref (
- 2018) Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods. Journal of Computational Physics 354: 344–369.Google ScholarCross Ref (
- 2017)
Why is MPI so Slow? Analyzing the Fundamental Limits in Implementing MPI-3.1 . In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Denver, Colorado, New York, NY, USA: Association for Computing Machinery. DOI: 10.1145/3126908.3126963.Google Scholar , ( - RAJA (2020) RAJA performance portability layer [Software]. Available at: https://github.com/LLNL/RAJA (accessed 26 May 2021).Google Scholar
- Remhos (2020) Remhos: High-order remap miniapp [Software]. Available at: https://github.com/ceed/Remhos (accessed 26 May 2021).Google Scholar
- 2019) Improved Support for Parallel Adaptive Simulation in CEED. Technical Report CEED-MS29, Exascale Computing Project. DOI: 10.5281/zenodo.3336420.Google Scholar , (
- 2017)
Rf wave simulation for cold edge plasmas using the mfem library . In: EPJ Web of Conferences, Vol.157. Les Ulis: EDP Sciences, p. 03048.Google Scholar , ( - Simmetrix (2020) Simmetrix: Enabling simulation-based design. Available at: http://www.simmetrix.com/ (accessed 26 May 2021).Google Scholar
- 2015) Comparison of multigrid algorithms for high-order continuous finite element discretizations. Numerical Linear Algebra with Applications 22(4): 664–680.Google ScholarCross Ref (
- 2019) Acceleration of tensor-product operations for high-order finite element methods. The International Journal of High Performance Computing Applications 33(4): 735–757.Google ScholarDigital Library , (
- 2018) A novel numerical treatment of the near-wall regions in the k- ω class of the rans models. International Journal of Heat and Fluid Flow 72: 186–199.Google Scholar , (
- 2019) Performance tuning of CEED software and 1st and 2nd wave apps. Technical Report CEED-MS32, Exascale Computing Project. DOI:10.5281/zenodo.3477618.Google Scholar , (
- 2018) Performance tuning of CEED software and first wave apps. Technical Report CEED-MS20, Exascale Computing Project. DOI:10.5281/zenodo.2542350.Google Scholar , (
- 2020) Drift-diffusion solvers. Part II: STEADY PROBLEMS. ANL/MCS-P9295-0420.Google Scholar , (
- VisIt (2020) VisIt: a distributed, parallel visualization and analysis tool [Software]. Available at: https://visit.llnl.gov. DOI: 10.11578/dc.20171025.on.1019 (accessed 26 May 2021).Google Scholar
- 2016) PCBDDC: a class of robust dual-primal methods in PETSc. SIAM Journal on Scientific Computing 38(5): S282–S306.Google ScholarCross Ref (
Index Terms
(auto-classified)Efficient exascale discretizations: High-order finite element methods
Comments