ABSTRACT
Performance models are well-known instruments to understand the scaling behavior of parallel applications. They express how performance changes as key execution parameters, such as the number of processes or the size of the input problem, vary. Besides reasoning about program behavior, such models can also be automatically derived from performance data. This is called empirical performance modeling. While this sounds simple at the first glance, this approach faces several serious interrelated challenges, including expensive performance measurements, inaccuracies inflicted by noisy benchmark data, and overall complex experiment design, starting with the selection of the right parameters. The more parameters one considers, the more experiments are needed and the stronger the impact of noise. In this paper, we show how taint analysis, a technique borrowed from the domain of computer security, can substantially improve the modeling process, lowering its cost, improving model quality, and help validate performance models and experimental setups.
- 2018. Extra-P 3.0. http://www.scalasca.org/software/extra-p/download.html.Google Scholar
- Sriram Aananthakrishnan, Greg Bronevetsky, and Ganesh Gopalakrishnan. 2013. Hybrid Approach for Data-flow Analysis of MPI Programs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (Eugene, Oregon, USA) (ICS '13). ACM, New York, NY, USA, 455--456.Google ScholarDigital Library
- Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.Google ScholarDigital Library
- D. an Mey, S. Biersdorff, C. Bischof, K. Diethelm, D. Eschweiler, M. Gerndt, A. Knüpfer, D. Lorenz, A. D. Malony, W. E. Nagel, Y. Oleynik, C. Rössel, P. Saviankou, D. Schmidl, S. S. Shende, M. Wagner, B. Wesarg, and F. Wolf. 2012. Score-P: A Unified Performance Measurement System for Petascale Applications. In Proc. of the CiHPC: Competence in High Performance Computing, HPC Status Konferenz der Gauß-Allianz e.V., Schwetzingen, Germany, June 2010. Gauß-Allianz, Springer, 85--97. Google ScholarCross Ref
- G. Bauer, S. Gottlieb, and T. Hoefler. 2012. Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3 rmd. In Proc. of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (Ottawa, Canada). IEEE Computer Society, 652--659.Google Scholar
- Claude Bernard, Michael C. Ogilvie, Thomas A. Degrand, Carleton E. Detar, Steven A. Gottlieb, A. Krasnitz, R.L. Sugar, and D. Toussaint. 1991. Studying Quarks and Gluons On Mimd Parallel Computers. Int. J. High Perform. Comput. Appl. 5, 4 (Dec. 1991), 61--70.Google Scholar
- A. Bhattacharyya, G. Kwasniewski, and T. Hoefler. 2015. Using Compiler Techniques to Improve Automatic Performance Modeling. In 2015 International Conference on Parallel Architecture and Compilation (PACT). 468--479.Google Scholar
- Alexandru Calotoiu, David Beckingsale, Christopher W. Earl, Torsten Hoefler, Ian Karlin, Martin Schulz, and Felix Wolf. 2016. Fast Multi-Parameter Performance Modeling. In Proc. of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei, Taiwan. IEEE Computer Society, 172--181.Google ScholarCross Ref
- Alexandru Calotoiu, Alexander Graf, Torsten Hoefler, Daniel Lorenz, Sebastian Rinke, and Felix Wolf. 2018. Lightweight Requirements Engineering for Exascale Co-design. IEEE. To appear in IEEE International Conference on Cluster Computing (Cluster'18).Google Scholar
- Alexandru Calotoiu, Alexander Graf, Torsten Hoefler, Daniel Lorenz, Sebastian Rinke, and Felix Wolf. 2018. Lightweight Requirements Engineering for Exascale Co-design. In Proc. of the 2018 IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK. IEEE, 1--11.Google ScholarCross Ref
- A. Calotoiu, T. Hoefler, M. Poke, and F. Wolf. 2013. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC13).Google Scholar
- Alexandru Calotoiu, Torsten Hoefler, Marius Poke, and Felix Wolf. 2013. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In Proc. of the ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA. 1--12.Google ScholarDigital Library
- Alonzo Church. 1936. An unsolvable problem of elementary number theory. American journal of mathematics 58, 2 (1936), 345--363.Google Scholar
- James Clause, Wanchun Li, and Alessandro Orso. 2007. Dytan: a generic dynamic taint analysis framework. In Proceedings of the 2007 international symposium on Software testing and analysis. ACM, 196--206.Google ScholarDigital Library
- James Clause, Wanchun Li, and Alessandro Orso. 2007. Dytan: A Generic Dynamic Taint Analysis Framework. In Proceedings of the 2007 International Symposium on Software Testing and Analysis (London, United Kingdom) (ISSTA '07). Association for Computing Machinery, New York, NY, USA, 196--206. Google ScholarDigital Library
- dfsan 2019. Clang 9 Documentation - DataFlowSanitizer. https://clang.llvm.org/docs/DataFlowSanitizer.html.Google Scholar
- X. Fu, Z. Chen, Y. Zhang, C. Huang, W. Dong, and J. Wang. 2015. MPISE: Symbolic Execution of MPI Programs. In 2015 IEEE 16th International Symposium on High Assurance Systems Engineering. 181--188.Google Scholar
- Dan Gohman. 2009. ScalarEvolution and Loop Optimization. Talk at LLVM Developer's Meeting.Google Scholar
- S. F. Goldsmith, A. S. Aiken, and D. S. Wilkerson. 2007. Measuring Empirical Computational Complexity. In Proc. of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (Dubrovnik, Croatia) (ESEC-FSE '07). ACM, New York, NY, USA, 395--404.Google Scholar
- P. Gschwandtner, A. Hirsch, S. Benedict, and T. Fahringer. 2018. Towards Automatic Compiler-assisted Performance and Energy Modeling for Message Passing Parallel Programs. In ARCS Workshop 2018; 31th International Conference on Architecture of Computing Systems. 1--8.Google Scholar
- J. Hammer, G. Hager, J. Eitzinger, and G. Wellein. 2015. Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft. CoRR abs/1509.03778 (2015). http://arxiv.org/abs/1509.03778Google Scholar
- Torsten Hoefler, William Gropp, William Kramer, and Marc Snir. 2011. Performance Modeling for Systematic Performance Tuning. In State of the Practice Reports (Seattle, Washington) (SC '11). ACM, New York, NY, USA, Article 6, 12 pages.Google Scholar
- T. Hoefler and D. Moor. 2014. Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations. Journal of Supercomputing Frontiers and Innovations 1, 2 (Oct. 2014), 58--75.Google Scholar
- T. Hoefler, T. Schneider, and A. Lumsdaine. 2010. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10).Google Scholar
- Kashif Ilyas, Alexandru Calotoiu, and Felix Wolf. 2017. Off-Road Performance Modeling - How to Deal with Segmented Data. In Proc. of the 23rd Euro-Par Conference, Santiago de Compostela, Spain (Lecture Notes in Computer Science, Vol. 10417). Springer, 36--48. Google ScholarCross Ref
- Engin Ipek, Bronis R. de Supinski, Martin Schulz, and Sally A. McKee. 2005. An Approach to Performance Prediction for Parallel Applications. In Proceedings of the 11th International Euro-Par Conference on Parallel Processing (Lisbon, Portugal) (Euro-Par'05). Springer-Verlag, Berlin, Heidelberg, 196--205.Google Scholar
- A. Jayakumar, P. Murali, and S. Vadhiyar. 2015. Matching Application Signatures for Performance Predictions Using a Single Execution. In Proc. of the 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2015). 1161--1170.Google Scholar
- Min Gyung Kang, Stephen McCamant, Pongsin Poosankam, and Dawn Song. 2011. DTA++: Dynamic Taint Analysis with Targeted Control-Flow Propagation. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February - 9th February 2011. The Internet Society.Google Scholar
- Ian Karlin, Jeff Keasler, and Rob Neely. 2013. LULESH 2.0 Updates and Changes. Technical Report LLNL-TR-641973. 1--9 pages.Google Scholar
- Vasileios P. Kemerlis, Georgios Portokalidis, Kangkook Jee, and Angelos D. Keromytis. 2012. Libdft: Practical Dynamic Data Flow Tracking for Commodity Systems. In Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments (London, England, UK) (VEE '12). Association for Computing Machinery, New York, NY, USA, 121--132. Google ScholarDigital Library
- D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. 2001. Predictive Performance and Scalability Modeling of a Large-Scale Application. In SC '01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing. 39--39.Google Scholar
- M. Kuhnemann, T. Rauber, and G. Runger. 2004. A source code analyzer for performance prediction. In 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. 262--.Google Scholar
- C. Lattner and V. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proc. of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (Palo Alto, California) (CGO '04). IEEE Computer Society, Washington, DC, USA.Google Scholar
- Benjamin C. Lee, David M. Brooks, Bronis R. de Supinski, Martin Schulz, Karan Singh, and Sally A. McKee. 2007. Methods of inference and learning for performance modeling of parallel applications. In Proc. of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Jose, California, USA) ((PPoPP '07)). ACM, 249--258.Google Scholar
- Seyong Lee, Jeremy S. Meredith, and Jeffrey S. Vetter. 2015. COMPASS: A Framework for Automated Performance Modeling and Prediction. In Proceedings of the 29th ACM on International Conference on Supercomputing (Newport Beach, California, USA) (ICS '15). ACM, New York, NY, USA, 405--414.Google Scholar
- Y. J. Lo, S. Williams, B. Van Straalen, T. J. Ligocki, M. J. Cordery, N. J. Wright, M. W. Hall, and L. Oliker. 2014. Roofline Model Toolkit: A practical tool for architectural and program analysis. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. Springer, 129--148.Google Scholar
- G. Marin and J. Mellor-Crummey. 2004. Cross-architecture performance predictions for scientific applications using parameterized models. SIGMETRICS Performance Eval. Review 32, 1 (June 2004), 2--13.Google Scholar
- K. Meng and B. Norris. 2017. Mira: A Framework for Static Performance Analysis. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 103--113.Google Scholar
- M. R. Meswani, L. Carrington, D. Unat, A. Snavely, S. Baden, and S. Poole. 2013. Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators. Int. J. High Perform. Comput. Appl. 27, 2 (May 2013), 89--108. Google ScholarDigital Library
- Henry Gordon Rice. 1953. Classes of recursively enumerable sets and their decision problems. Trans. Amer. Math. Soc. 74, 2 (1953), 358--366.Google ScholarCross Ref
- Marcus Ritter, Alexandru Calotoiu, Thorsten Reimann, Torsten Hoefler, and Felix Wolf. 2020. Performance Modeling at a Discount. IEEE. Accepted at the 34th IEEE International Parallel & Distributed Processing Symposium (IPDPS'20).Google Scholar
- Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, and Felix Wolf. 2020. Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling. In Proc. of the 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA. IEEE Computer Society. (to appear).Google ScholarCross Ref
- Philip C. Roth and Jeremy S. Meredith. 2014. Value Influence Analysis for Message Passing Applications. In Proceedings of the 28th ACM International Conference on Supercomputing (Munich, Germany) (ICS '14). ACM, New York, NY, USA, 145--154.Google Scholar
- Marc Shapiro and Susan Horwitz. 1997. The effects of the precision of pointer analysis. In International Static Analysis Symposium. Springer, 16--34.Google ScholarCross Ref
- Dongdong She, Yizheng Chen, Abhishek Shah, Baishakhi Ray, and Suman Jana. 2019. Neutaint: Efficient Dynamic Taint Analysis with Neural Networks. arXiv:1907.03756 [cs.CR]Google Scholar
- Sergei Shudler, Alexandru Calotoiu, Torsten Hoefler, Alexandre Strube, and Felix Wolf. 2015. Exascaling Your Library: Will Your Implementation Meet Your Expectations?. In Proc. of the International Conference on Supercomputing (ICS), Newport Beach, CA, USA. 1--11.Google ScholarDigital Library
- Norbert Siegmund, Alexander Grebhahn, Sven Apel, and Christian Kästner. 2015. Performance-influence Models for Highly Configurable Systems. In Proc. of the 2015 10th Joint Meeting on Foundations of Software Engineering (Bergamo, Italy) (ESEC/FSE 2015). ACM, New York, NY, USA, 284--294.Google ScholarDigital Library
- Vítor Sousa, Daniel de Oliveira, Patrick Valduriez, and Marta Mattoso. 2018. DfAnalyzer: Runtime Dataflow Analysis of Scientific Applications using Provenance. Proceedings of the VLDB Endowment 11 (08 2018).Google Scholar
- K. L. Spafford and J. S. Vetter. 2012. Aspen: A Domain Specific Language for Performance Modeling. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis (Salt Lake City, Utah) (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 84, 11 pages.Google Scholar
- Yulei Sui and Jingling Xue. 2016. SVF: interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th international conference on compiler construction. ACM, 265--266.Google ScholarDigital Library
- N. R. Tallent and A. Hoisie. 2014. Palm: Easing the Burden of Analytical Performance Modeling. In Proc. of the 28th ACM International Conference on Supercomputing (Munich, Germany) (ICS '14). ACM, New York, NY, USA, 221--230. Google ScholarDigital Library
- Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 1 (Feb. 2005), 49--66.Google ScholarDigital Library
- Sebastian Unger and Frank Mueller. 2002. Handling irreducible loops: optimized node splitting versus DJ-graphs. ACM Transactions on Programming Languages and Systems (TOPLAS) 24, 4 (2002), 299--333.Google ScholarDigital Library
- R. Vuduc, J. W. Demmel, and J. A. Bilmes. 2004. Statistical Models for Empirical Search-Based Performance Tuning. Int. J. High Perform. Comput. Appl. 18, 1 (Feb. 2004), 65--94.Google ScholarDigital Library
- Xing Wu and Frank Müller. 2012. ScalaExtrap: Trace-Based Communication Extrapolation for SPMD Programs. ACM Transactions on Programming Languages and Systems 34, 1 (April 2012).Google ScholarDigital Library
- Hengbiao Yu. 2018. Combining Symbolic Execution and Model Checking to Verify MPI Programs. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (Gothenburg, Sweden) (ICSE '18). Association for Computing Machinery, New York, NY, USA, 527--530. Google ScholarDigital Library
- D. Zaparanuks and M. Hauswirth. 2012. Algorithmic Profiling. SIGPLAN Not. 47, 6 (June 2012), 67--76.Google ScholarDigital Library
- Angeliki Zavou, Georgios Portokalidis, and Angelos D. Keromytis. 2011. Taint-Exchange: A Generic System for Cross-Process and Cross-Host Taint Tracking. In Advances in Information and Computer Security, Tetsu Iwata and Masakatsu Nishigaki (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 113--128.Google Scholar
- Jidong Zhai, Wenguang Chen, and Weimin Zheng. 2010. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. SIGPLAN Notices 45, 5 (January 2010), 305--314.Google ScholarDigital Library
Index Terms
Extracting clean performance models from tainted programs
Comments