10.1145/3168804acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

The generalized matrix chain algorithm

Published:24 February 2018Publication History

ABSTRACT

In this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product M := A1 A2An that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 9. The motivation for this work comes from the fact that—despite great advances in the development of compilers—the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed.

References

  1. 2017. Intel®Math Kernel Library documentation. https://software. intel.com/en-us/mkl-reference-manual-for-c . (2017).Google ScholarGoogle Scholar
  2. 2017. Matlab documentation. http://www.mathworks.com/help/ matlab . (2017).Google ScholarGoogle Scholar
  3. Zaid Albataineh and Fathi M. Salem. 2014. A Blind Adaptive CDMA Receiver Based on State Space Structures. CoRR abs/1408.0196 (2014).Google ScholarGoogle Scholar
  4. Edward Anderson, Zhaojun Bai, Christian Bischof, Susan Blackford, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, A McKenney, and D Sorensen. 1999. LAPACK Users’ guide. Vol. 9. SIAM.Google ScholarGoogle Scholar
  5. Henrik Barthels. 2016. A Compiler for Linear Algebra Operations. In SPLASH ’16 Companion . ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman. 2012. Julia: A Fast Dynamic Language for Technical Computing. (September 2012). arXiv: 1209.5145Google ScholarGoogle Scholar
  7. Paolo Bientinesi, Brian Gunter, and Robert A. van de Geijn. 2008. Families of Algorithms Related to the Inversion of a Symmetric Positive Definite Matrix. ACM Trans. Math. Softw. 35, 1, Article 3 (July 2008), 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Phillip G. Bradford, Gregory J.E. Rawlins, and Gregory E. Shannon. 1998. Efficient Matrix Chain Ordering in Polylog Time. SIAM J. Comput. 27, 2 (1998), 466–490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alexander M. Bronstein, Yoni Choukroun, Ron Kimmel, and Matan Sela. 2016. Consistent Discretization and Minimization of the L1 Norm on Manifolds. CoRR abs/1609.05434 (2016). arXiv: 1609.05434Google ScholarGoogle Scholar
  10. Francis Y. Chin. 1978. An O(N) Algorithm for Determining a Nearoptimal Computation Order of Matrix Chain Products. Commun. ACM 21, 7 (July 1978), 544–549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jaeyoung Choi, Jack J Dongarra, and David W Walker. 1995. The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form. Numerical Algorithms 10, 2 (1995), 379–399.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jim Christian. 1993. Flatterms, Discrimination Nets, and Fast Term Rewriting. Journal of automated reasoning 10, 1 (1993), 95–113.Google ScholarGoogle ScholarCross RefCross Ref
  13. Thomas H. Cormen, Ronald L. Rivest, and Charles E. Leiserson. 1990. Introduction to Algorithms . McGraw-Hill, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Artur Czumaj. 1996. Very Fast Approximation of the Matrix Chain Product Problem. Journal of Algorithms 21, 1 (1996), 71–79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yin Ding and Ivan W. Selesnick. 2016. Sparsity-Based Correction of Exponential Artifacts. Signal Processing 120 (2016), 236–248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain S. Duff. 1990. A set of Level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software (TOMS) 16, 1 (1990), 1–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jack J. Dongarra, Sven Hammarling, Richard J. Hanson, and Jeremy Croz. 1988. An Extended set of FORTRAN Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software (TOMS) 14, 4 (1988), 399. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Diego Fabregat-Traver and Paolo Bientinesi. 2011. Automatic Generation of Loop-Invariants for Matrix Operations. In Computational Science and its Applications, International Conference . IEEE Computer Society, Los Alamitos, CA, USA, 82–92. http://hpac.rwth-aachen.de/ aices/preprint/documents/AICES-2011-02-01.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Diego Fabregat-Traver and Paolo Bientinesi. 2011. Knowledge-Based Automatic Generation of Partitioned Matrix Expressions. CASC 6885, 4 (2011), 144–157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Diego Fabregat-Traver and Paolo Bientinesi. 2013. A Domain-Specific Compiler for Linear Algebra Operations. In High Performance Computing for Computational Science – VECPAR 2010 (Lecture Notes in Computer Science) , O. Marques M. Dayde and K. Nakajima (Eds.), Vol. 7851. Springer, Heidelberg, 346–361.Google ScholarGoogle Scholar
  21. Diego Fabregat-Traver and Paolo Bientinesi. 2013. Application-tailored Linear Algebra Algorithms: A search-sased Approach. International Journal of High Performance Computing Applications (IJHPCA) 27, 4 (Nov. 2013), 425 – 438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gene H. Golub and Charles F. Van Loan. 2013. Matrix Computations. Vol. 4. Johns Hopkins.Google ScholarGoogle Scholar
  23. Albert Gräf. 1991. Left-to-Right Tree Pattern Matching. In International Conference on Rewriting Techniques and Applications . Springer, 323– 334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. http://eigen. tuxfamily.org . (2010).Google ScholarGoogle Scholar
  25. M. Hejazi, S. M. Azimi-Abarghouyi, B. Makki, M. Nasiri-Kenari, and T. Svensson. 2016. Robust Successive Compute-and-Forward Over Multiuser Multirelay Networks. IEEE Transactions on Vehicular Technology 65, 10 (Oct 2016), 8112–8129.Google ScholarGoogle ScholarCross RefCross Ref
  26. T.C. Hu and M.T. Shing. 1982. Computation of Matrix Chain Products. Part I. SIAM J. Comput. 11, 2 (1982), 362–373.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T.C. Hu and M.T. Shing. 1984. Computation of Matrix Chain Products. Part II. SIAM J. Comput. 13, 2 (1984), 228–251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Klaus Iglberger, Georg Hager, Jan Treibig, and Ulrich Rüde. 2012. Expression Templates Revisited: A Performance Analysis of the Current ET Methodologies. SIAM Journal on Scientific Computing 34, 2 (2012), C42–C69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Manuel Krebber. 2017. Non-linear Associative-Commutative Many-toOne Pattern Matching with Sequence Variables. CoRR abs/1705.00907 (2017). http://arxiv.org/abs/1705.00907Google ScholarGoogle Scholar
  30. Chuck L. Lawson, Richard J. Hanson, David R. Kincaid, and Fred T. Krogh. 1979. Basic Linear Algebra Subprograms for FORTRAN Usage. ACM Transactions on Mathematical Software (TOMS) 5, 3 (1979), 308– 323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Heejo Lee, Jong Kim, Sung Je Hong, and Sunggu Lee. 2003. Processor Allocation and Task Scheduling of Matrix Chain Products on Parallel Systems. Parallel and Distributed Systems, IEEE Transactions on 14, 4 (2003), 394–407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Noël M Nachtigal, Satish C Reddy, and Lloyd N Trefethen. 1992. How Fast are Nonsymmetric Matrix Iterations? SIAM J. Matrix Anal. Appl. 13, 3 (1992), 778–795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nadia Nedjah, Colin Walter, and Stephen Eldridge. 1997. Optimal Left-to-Right Pattern-Matching Automata. In Algebraic and Logic Programming . Springer, 273–286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Elias D. Niño, Adrian Sandu, and Xinwei Deng. 2016. A Parallel Implementation of the Ensemble Kalman Filter Based on Modified Cholesky Decomposition. CoRR abs/1606.00807 (2016). http://arxiv. org/abs/1606.00807Google ScholarGoogle Scholar
  35. Kazufumi Nishida, Yasuaki Ito, and Koji Nakano. 2011. Accelerating the Dynamic Programming for the Matrix Chain Product on the GPU. In Networking and Computing (ICNC), 2011 Second International Conference on . IEEE, 320–326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Silvia Noschese and Lothar Reichel. 2016. Some Matrix Nearness Problems Suggested by Tikhonov Regularization. Linear Algebra Appl. 502 (2016), 366–386.Google ScholarGoogle ScholarCross RefCross Ref
  37. Devangi N Parikh, Maggie E Myers, and Robert A van de Geijn. 2017. Deriving Correct High-Performance Algorithms. arXiv preprint arXiv:1710.04286 (2017). arXiv: 1710.04286Google ScholarGoogle Scholar
  38. Elmar Peise and Paolo Bientinesi. 2012. Performance Modeling for Dense Linear Algebra. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (PMBS12) (SCC ’12) . IEEE Computer Society, Washington, DC, USA, 406–416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Elmar Peise and Paolo Bientinesi. 2015. A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels. In High Performance Computing for Computational Science – VECPAR 2014 (Lecture Notes in Computer Science) , Michel Daydé, Osni Marques, and Kengo Nakajima (Eds.), Vol. 8969. Springer International Publishing, 245–258. arXiv: 1402.5897v1Google ScholarGoogle Scholar
  40. Elmar Peise and Paolo Bientinesi. 2015. The ELAPS Framework: Experimental Linear Algebra Performance Studies. CoRR abs/1504.08035 (2015). http://arxiv.org/abs/1504.08035Google ScholarGoogle Scholar
  41. Prakash Ramanan. 1996. An Efficient Parallel Algorithm for the MatrixChain-Product Problem. SIAM J. Comput. 25, 4 (1996), 874–893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. V. Rao, A. Sandu, M. Ng, and E. Nino-Ruiz. 2015. Robust Data Assimilation Using L_1 and Huber Norms. ArXiv e-prints (Nov. 2015). arXiv: math.NA/1511.01593Google ScholarGoogle Scholar
  43. Vishwas Rao, Adrian Sandu, Michael Ng, and Elias D. Nino-Ruiz. 2017. Robust Data Assimilation Using L 1 and Huber Norms. SIAM Journal on Scientific Computing 39, 3 (2017), B548–B570.Google ScholarGoogle ScholarCross RefCross Ref
  44. Henrik Ronellenfitsch, Marc Timme, and Dirk Witthaut. 2015. A Dual Method for Computing Power Transfer Distribution Factors. CoRR abs/1510.04645 (2015). http://arxiv.org/abs/1510.04645Google ScholarGoogle Scholar
  45. Conrad Sanderson. 2010. Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. (2010).Google ScholarGoogle Scholar
  46. Jeremy G. Siek, Ian Karlin, and Elizabeth R. Jessup. 2008. Build to Order Linear Algebra Kernels. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on . IEEE, 1–8.Google ScholarGoogle Scholar
  47. Daniele G. Spampinato and Markus Püschel. 2014. A Basic Linear Algebra Compiler. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization . ACM, 23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Daniele G Spampinato and Markus Püschel. 2016. A Basic Linear Algebra Compiler for Structured Matrices. In International Symposium on Code Generation and Optimization (CGO) . 117–127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Damian Straszak and Nisheeth K Vishnoi. 2015. On a Natural Dynamics for Linear Programming. (2015). arXiv: 1511.07020Google ScholarGoogle Scholar
  50. Steve Strate, Roger L. Wainwright, et al. 1990. Parallelization of the Dynamic Programming Algorithm for the Matrix Chain Product on a Hypercube. In Applied Computing, 1990., Proceedings of the 1990 Symposium on . IEEE, 78–84.Google ScholarGoogle ScholarCross RefCross Ref
  51. Todd L. Veldhuizen. 1998. Arrays in Blitz++. In International Symposium on Computing in Object-Oriented Parallel Environments . Springer, 223–230. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The generalized matrix chain algorithm

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization
            February 2018
            377 pages
            ISBN:9781450356176
            DOI:10.1145/3179541

            Copyright © 2018 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 February 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate 312 of 1,061 submissions, 29%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!