research-article

The generalized matrix chain algorithm

Authors:
Henrik Barthels

RWTH Aachen University, Germany

RWTH Aachen University, Germany
View Profile

,
Marcin Copik

RWTH Aachen University, Germany

RWTH Aachen University, Germany
View Profile

,
Paolo Bientinesi

RWTH Aachen University, Germany

RWTH Aachen University, Germany
View Profile

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and OptimizationFebruary 2018 Pages 138–148https://doi.org/10.1145/3168804

Published:24 February 2018Publication History

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

Pages 138–148

ABSTRACT

In this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product M := A₁ A₂ ⋯ A_n that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 9. The motivation for this work comes from the fact that—despite great advances in the development of compilers—the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed.

References

2017. Intel®Math Kernel Library documentation. https://software. intel.com/en-us/mkl-reference-manual-for-c . (2017).Google Scholar
2017. Matlab documentation. http://www.mathworks.com/help/ matlab . (2017).Google Scholar
Zaid Albataineh and Fathi M. Salem. 2014. A Blind Adaptive CDMA Receiver Based on State Space Structures. CoRR abs/1408.0196 (2014).Google Scholar
Edward Anderson, Zhaojun Bai, Christian Bischof, Susan Blackford, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, A McKenney, and D Sorensen. 1999. LAPACK Users’ guide. Vol. 9. SIAM.Google Scholar
Henrik Barthels. 2016. A Compiler for Linear Algebra Operations. In SPLASH ’16 Companion . ACM. Google ScholarDigital Library
Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman. 2012. Julia: A Fast Dynamic Language for Technical Computing. (September 2012). arXiv: 1209.5145Google Scholar
Paolo Bientinesi, Brian Gunter, and Robert A. van de Geijn. 2008. Families of Algorithms Related to the Inversion of a Symmetric Positive Definite Matrix. ACM Trans. Math. Softw. 35, 1, Article 3 (July 2008), 22 pages. Google ScholarDigital Library
Phillip G. Bradford, Gregory J.E. Rawlins, and Gregory E. Shannon. 1998. Efficient Matrix Chain Ordering in Polylog Time. SIAM J. Comput. 27, 2 (1998), 466–490. Google ScholarDigital Library
Alexander M. Bronstein, Yoni Choukroun, Ron Kimmel, and Matan Sela. 2016. Consistent Discretization and Minimization of the L1 Norm on Manifolds. CoRR abs/1609.05434 (2016). arXiv: 1609.05434Google Scholar
Francis Y. Chin. 1978. An O(N) Algorithm for Determining a Nearoptimal Computation Order of Matrix Chain Products. Commun. ACM 21, 7 (July 1978), 544–549. Google ScholarDigital Library
Jaeyoung Choi, Jack J Dongarra, and David W Walker. 1995. The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form. Numerical Algorithms 10, 2 (1995), 379–399.Google ScholarCross Ref
Jim Christian. 1993. Flatterms, Discrimination Nets, and Fast Term Rewriting. Journal of automated reasoning 10, 1 (1993), 95–113.Google ScholarCross Ref
Thomas H. Cormen, Ronald L. Rivest, and Charles E. Leiserson. 1990. Introduction to Algorithms . McGraw-Hill, Inc. Google ScholarDigital Library
Artur Czumaj. 1996. Very Fast Approximation of the Matrix Chain Product Problem. Journal of Algorithms 21, 1 (1996), 71–79. Google ScholarDigital Library
Yin Ding and Ivan W. Selesnick. 2016. Sparsity-Based Correction of Exponential Artifacts. Signal Processing 120 (2016), 236–248. Google ScholarDigital Library
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain S. Duff. 1990. A set of Level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software (TOMS) 16, 1 (1990), 1–17. Google ScholarDigital Library
Jack J. Dongarra, Sven Hammarling, Richard J. Hanson, and Jeremy Croz. 1988. An Extended set of FORTRAN Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software (TOMS) 14, 4 (1988), 399. Google ScholarDigital Library
Diego Fabregat-Traver and Paolo Bientinesi. 2011. Automatic Generation of Loop-Invariants for Matrix Operations. In Computational Science and its Applications, International Conference . IEEE Computer Society, Los Alamitos, CA, USA, 82–92. http://hpac.rwth-aachen.de/ aices/preprint/documents/AICES-2011-02-01.pdf Google ScholarDigital Library
Diego Fabregat-Traver and Paolo Bientinesi. 2011. Knowledge-Based Automatic Generation of Partitioned Matrix Expressions. CASC 6885, 4 (2011), 144–157. Google ScholarDigital Library
Diego Fabregat-Traver and Paolo Bientinesi. 2013. A Domain-Specific Compiler for Linear Algebra Operations. In High Performance Computing for Computational Science – VECPAR 2010 (Lecture Notes in Computer Science) , O. Marques M. Dayde and K. Nakajima (Eds.), Vol. 7851. Springer, Heidelberg, 346–361.Google Scholar
Diego Fabregat-Traver and Paolo Bientinesi. 2013. Application-tailored Linear Algebra Algorithms: A search-sased Approach. International Journal of High Performance Computing Applications (IJHPCA) 27, 4 (Nov. 2013), 425 – 438. Google ScholarDigital Library
Gene H. Golub and Charles F. Van Loan. 2013. Matrix Computations. Vol. 4. Johns Hopkins.Google Scholar
Albert Gräf. 1991. Left-to-Right Tree Pattern Matching. In International Conference on Rewriting Techniques and Applications . Springer, 323– 334. Google ScholarDigital Library
Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. http://eigen. tuxfamily.org . (2010).Google Scholar
M. Hejazi, S. M. Azimi-Abarghouyi, B. Makki, M. Nasiri-Kenari, and T. Svensson. 2016. Robust Successive Compute-and-Forward Over Multiuser Multirelay Networks. IEEE Transactions on Vehicular Technology 65, 10 (Oct 2016), 8112–8129.Google ScholarCross Ref
T.C. Hu and M.T. Shing. 1982. Computation of Matrix Chain Products. Part I. SIAM J. Comput. 11, 2 (1982), 362–373.Google ScholarDigital Library
T.C. Hu and M.T. Shing. 1984. Computation of Matrix Chain Products. Part II. SIAM J. Comput. 13, 2 (1984), 228–251. Google ScholarDigital Library
Klaus Iglberger, Georg Hager, Jan Treibig, and Ulrich Rüde. 2012. Expression Templates Revisited: A Performance Analysis of the Current ET Methodologies. SIAM Journal on Scientific Computing 34, 2 (2012), C42–C69. Google ScholarDigital Library
Manuel Krebber. 2017. Non-linear Associative-Commutative Many-toOne Pattern Matching with Sequence Variables. CoRR abs/1705.00907 (2017). http://arxiv.org/abs/1705.00907Google Scholar
Chuck L. Lawson, Richard J. Hanson, David R. Kincaid, and Fred T. Krogh. 1979. Basic Linear Algebra Subprograms for FORTRAN Usage. ACM Transactions on Mathematical Software (TOMS) 5, 3 (1979), 308– 323. Google ScholarDigital Library
Heejo Lee, Jong Kim, Sung Je Hong, and Sunggu Lee. 2003. Processor Allocation and Task Scheduling of Matrix Chain Products on Parallel Systems. Parallel and Distributed Systems, IEEE Transactions on 14, 4 (2003), 394–407. Google ScholarDigital Library
Noël M Nachtigal, Satish C Reddy, and Lloyd N Trefethen. 1992. How Fast are Nonsymmetric Matrix Iterations? SIAM J. Matrix Anal. Appl. 13, 3 (1992), 778–795. Google ScholarDigital Library
Nadia Nedjah, Colin Walter, and Stephen Eldridge. 1997. Optimal Left-to-Right Pattern-Matching Automata. In Algebraic and Logic Programming . Springer, 273–286. Google ScholarDigital Library
Elias D. Niño, Adrian Sandu, and Xinwei Deng. 2016. A Parallel Implementation of the Ensemble Kalman Filter Based on Modified Cholesky Decomposition. CoRR abs/1606.00807 (2016). http://arxiv. org/abs/1606.00807Google Scholar
Kazufumi Nishida, Yasuaki Ito, and Koji Nakano. 2011. Accelerating the Dynamic Programming for the Matrix Chain Product on the GPU. In Networking and Computing (ICNC), 2011 Second International Conference on . IEEE, 320–326. Google ScholarDigital Library
Silvia Noschese and Lothar Reichel. 2016. Some Matrix Nearness Problems Suggested by Tikhonov Regularization. Linear Algebra Appl. 502 (2016), 366–386.Google ScholarCross Ref
Devangi N Parikh, Maggie E Myers, and Robert A van de Geijn. 2017. Deriving Correct High-Performance Algorithms. arXiv preprint arXiv:1710.04286 (2017). arXiv: 1710.04286Google Scholar
Elmar Peise and Paolo Bientinesi. 2012. Performance Modeling for Dense Linear Algebra. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (PMBS12) (SCC ’12) . IEEE Computer Society, Washington, DC, USA, 406–416. Google ScholarDigital Library
Elmar Peise and Paolo Bientinesi. 2015. A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels. In High Performance Computing for Computational Science – VECPAR 2014 (Lecture Notes in Computer Science) , Michel Daydé, Osni Marques, and Kengo Nakajima (Eds.), Vol. 8969. Springer International Publishing, 245–258. arXiv: 1402.5897v1Google Scholar
Elmar Peise and Paolo Bientinesi. 2015. The ELAPS Framework: Experimental Linear Algebra Performance Studies. CoRR abs/1504.08035 (2015). http://arxiv.org/abs/1504.08035Google Scholar
Prakash Ramanan. 1996. An Efficient Parallel Algorithm for the MatrixChain-Product Problem. SIAM J. Comput. 25, 4 (1996), 874–893. Google ScholarDigital Library
V. Rao, A. Sandu, M. Ng, and E. Nino-Ruiz. 2015. Robust Data Assimilation Using L_1 and Huber Norms. ArXiv e-prints (Nov. 2015). arXiv: math.NA/1511.01593Google Scholar
Vishwas Rao, Adrian Sandu, Michael Ng, and Elias D. Nino-Ruiz. 2017. Robust Data Assimilation Using L 1 and Huber Norms. SIAM Journal on Scientific Computing 39, 3 (2017), B548–B570.Google ScholarCross Ref
Henrik Ronellenfitsch, Marc Timme, and Dirk Witthaut. 2015. A Dual Method for Computing Power Transfer Distribution Factors. CoRR abs/1510.04645 (2015). http://arxiv.org/abs/1510.04645Google Scholar
Conrad Sanderson. 2010. Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. (2010).Google Scholar
Jeremy G. Siek, Ian Karlin, and Elizabeth R. Jessup. 2008. Build to Order Linear Algebra Kernels. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on . IEEE, 1–8.Google Scholar
Daniele G. Spampinato and Markus Püschel. 2014. A Basic Linear Algebra Compiler. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization . ACM, 23. Google ScholarDigital Library
Daniele G Spampinato and Markus Püschel. 2016. A Basic Linear Algebra Compiler for Structured Matrices. In International Symposium on Code Generation and Optimization (CGO) . 117–127. Google ScholarDigital Library
Damian Straszak and Nisheeth K Vishnoi. 2015. On a Natural Dynamics for Linear Programming. (2015). arXiv: 1511.07020Google Scholar
Steve Strate, Roger L. Wainwright, et al. 1990. Parallelization of the Dynamic Programming Algorithm for the Matrix Chain Product on a Hypercube. In Applied Computing, 1990., Proceedings of the 1990 Symposium on . IEEE, 78–84.Google ScholarCross Ref
Todd L. Veldhuizen. 1998. Arrays in Blitz++. In International Symposium on Computing in Object-Oriented Parallel Environments . Springer, 223–230. Google ScholarDigital Library

Index Terms

The generalized matrix chain algorithm

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization
February 2018
377 pages
ISBN:9781450356176
DOI:10.1145/3179541
General Chairs:
Jens Knoop
Vienna University of Technology, Austria
,
Markus Schordan
Lawrence Livermore National Laboratory, USA
,
Program Chairs:
Teresa Johnson
Google, USA
,
Michael O'Boyle
University of Edinburgh, UK
Copyright © 2018 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
linear algebra
compiler
matrix chain problem
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate 312 of 1,061 submissions, 29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 181
  Total Downloads
- Downloads (Last 12 months)38
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The generalized matrix chain algorithm

Save to Binder

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

The generalized matrix chain algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The generalized matrix chain algorithm

Save to Binder

CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

The generalized matrix chain algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media