10.1145/3582016.3582070acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Free Access

APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

Published:25 March 2023Publication History

ABSTRACT

The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy-efficiency of an application running on the CGRA. This paper presents APEX, an automated approach for generating specialized PE architectures for an application or an application domain. APEX first analyzes application domain benchmarks using frequent subgraph mining to extract commonly occurring computational subgraphs. APEX then generates specialized PEs by merging subgraphs using a datapath graph merging algorithm. The merged datapath graphs are translated into a PE specification from which we automatically generate the PE hardware description in Verilog along with a compiler that maps applications to the PE. The PE hardware and compiler are inserted into a flexible CGRA generation and compilation toolchain that allows for agile evaluation of CGRAs. We evaluate APEX for two domains, machine learning and image processing. For image processing applications, our automatically generated CGRAs with specialized PEs achieve from 5% to 30% less area and from 22% to 46% less energy compared to a general-purpose CGRA. For machine learning applications, our automatically generated CGRAs consume 16% to 59% less energy and 22% to 39% less area than a general-purpose CGRA. This work paves the way for creation of application domain-driven design-space exploration frameworks that automatically generate efficient programmable accelerators, with a much lower design effort for both hardware and compiler generation.

References

  1. Giovanni Ansaloni, Paolo Bonzini, and Laura Pozzi. 2008. Design and Architectural Exploration of Expression-Grained Reconfigurable Arrays. In 2008 Symposium on Application Specific Processors. https://doi.org/10.1109/SASP.2008.4570782 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kubilay Atasu, Laura Pozzi, and Paolo Ienne. 2003. Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints. In Proceedings of the 40th Annual Design Automation Conference (DAC ’03). Association for Computing Machinery, New York, NY, USA. 256–261. isbn:1581136889 https://doi.org/10.1145/775832.775897 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, and Keyi Zhang. 2020. Creating an Agile Hardware Design Flow. In 2020 57th ACM/IEEE Design Automation Conference (DAC). https://doi.org/10.1109/DAC18072.2020.9218553 Google ScholarGoogle ScholarCross RefCross Ref
  4. Thilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2022. REVAMP: A Systematic Framework for Heterogeneous CGRA Realization. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA. isbn:9781450392051 https://doi.org/10.1145/3503222.3507772 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories. In Handbook of Model Checking, Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem (Eds.). Springer International Publishing. isbn:978-3-319-10575-8 https://doi.org/10.1007/978-3-319-10575-8_11 Google ScholarGoogle ScholarCross RefCross Ref
  6. Eli Bendersky. 2013. A Deeper Look into the LLVM Code Generator, Part 1. https://eli.thegreenplace.net/2013/02/25/a-deeper-look-into-the-llvm-code-generator-part-1 Google ScholarGoogle Scholar
  7. Robert Brummayer, Armin Biere, and Florian Lonsing. 2008. BTOR: Bit-Precise Modelling of Word-Level Problems for Model Checking. In Proceedings of the Joint Workshops of the 6th International Workshop on Satisfiability Modulo Theories and 1st International Workshop on Bit-Precise Reasoning (SMT ’08/BPR ’08). Association for Computing Machinery, New York, NY, USA. isbn:9781605584409 https://doi.org/10.1145/1512464.1512472 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pierre-Yves Calland, Anne Mignotte, Olivier Peyran, Yves Robert, and Frédéric Vivien. 1998. Retiming DAGs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/43.736571 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hong Cheng, Xifeng Yan, and Jiawei Han. 2010. Mining Graph Patterns. Springer US, Boston, MA. isbn:978-1-4419-6045-0 https://doi.org/10.1007/978-1-4419-6045-0_12 Google ScholarGoogle ScholarCross RefCross Ref
  10. Jason Cong, Yiping Fan, Guoling Han, and Zhiru Zhang. 2004. Application-Specific Instruction Generation for Configurable Processor Architectures. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA ’04). Association for Computing Machinery, New York, NY, USA. isbn:1581138296 https://doi.org/10.1145/968280.968307 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ross Daly, Caleb Donovick, Jackson Melchert, Rajsekhar Setaluri, Nestan Tsiskaridze Bullock, Priyanka Raina, Clark Barrett, and Pat Hanrahan. 2022. Synthesizing Instruction Selection Rewrite Rules from RTL using SMT. In Conference on Formal Methods in Computer-Aided Design (FMCAD). 139–150. https://doi.org/10.34727/2022/isbn.978-3-85448-053-2_20 Google ScholarGoogle ScholarCross RefCross Ref
  12. Ross Daly, Leonard Truong, and Pat Hanrahan. 2018. Invoking and Linking Generators from Multiple Hardware Languages using CoreIR. In Workshop on Open-Source EDA Technology (WOSET). https://woset-workshop.github.io/PDFs/2018/a11.pdf Google ScholarGoogle Scholar
  13. Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proc. VLDB Endow., issn:2150-8097 https://doi.org/10.14778/2732286.2732289 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Robert. B. Hitchcock, Gordon L. Smith, and David D. Cheng. 1982. Timing Analysis of Computer Hardware. IBM Journal of Research and Development, https://doi.org/10.1147/rd.261.0100 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dillon Huff, Steve Dai, and Pat Hanrahan. 2021. Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’21). Association for Computing Machinery, New York, NY, USA. isbn:9781450382182 https://doi.org/10.1145/3431920.3439457 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly, Caleb Donovick, Alex Carsello, Taeyoung Kong, Kathleen Feng, Dillon Huff, Ankita Nayak, Rajsekhar Setaluri, James Thomas, Nikhil Bhagdikar, David Durst, Zachary Myers, Nestan Tsiskaridze, Stephen Richardson, Rick Bahr, Kayvon Fatahalian, Pat Hanrahan, Clark Barrett, Mark Horowitz, Christopher Torng, Fredrik Kjolstad, and Priyanka Raina. 2022. AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers. ACM Transactions on Embedded Computing Systems (TECS), July, issn:1539-9087 https://doi.org/10.1145/3534933 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sihao Liu, Jian Weng, Dylan Kupsh, Atefeh Sohrabizadeh, Zhengrong Wang, Licheng Guo, Jiuyang Liu, Maxim Zhulin, Rishabh Mani, Lucheng Zhang, Jason Cong, and Tony Nowatzki. 2022. OverGen: Improving FPGA Usability through Domain-specific Overlay Generation. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO56248.2022.00018 Google ScholarGoogle ScholarCross RefCross Ref
  18. Nahri Moreano, Edson Borin, Cid C. de Souza, and Guido Araujo. 2005. Efficient datapath merging for partially reconfigurable architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2005.850844 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture for Parallel Patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1145/3079856.3080256 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not., issn:0362-1340 https://doi.org/10.1145/2499370.2462176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Edward Rosten and Tom Drummond. 2006. Machine Learning for High-Speed Corner Detection. In Computer Vision – ECCV 2006, Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. isbn:978-3-540-33833-8 https://doi.org/10.1007/11744023_34 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In MICRO. isbn:9781450369381 https://doi.org/10.1145/3352460.3358302 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cheng Tan, Chenhao Xie, Ang Li, Kevin J. Barker, and Antonino Tumeo. 2021. AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). https://doi.org/10.23919/DATE51398.2021.9473955 Google ScholarGoogle ScholarCross RefCross Ref
  24. Russell Tessier, Kenneth Pocek, and André DeHon. 2015. Reconfigurable Computing Architectures. Proc. IEEE, https://doi.org/10.1109/JPROC.2014.2386883 Google ScholarGoogle ScholarCross RefCross Ref
  25. Lenny Truong and Pat Hanrahan. 2019. A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity. In 3rd Summit on Advances in Programming Languages, SNAPL 2019, May 16-17, 2019, Providence, RI, USA, Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.) (LIPIcs). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.SNAPL.2019.7 Google ScholarGoogle ScholarCross RefCross Ref
  26. Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Evaluating Programmable Architectures for Imaging and Vision Applications. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO.2016.7783755 Google ScholarGoogle ScholarCross RefCross Ref
  27. Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation Cores: Reducing the Energy of Mature Computations. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery, New York, NY, USA. isbn:9781605588391 https://doi.org/10.1145/1736020.1736044 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ganesh Venkatesh, Jack Sampson, Nathan Goulding-Hotta, Sravanthi Kota Venkata, Michael Bedford Taylor, and Steven Swanson. 2011. QsCores: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). Association for Computing Machinery, New York, NY, USA. isbn:9781450310536 https://doi.org/10.1145/2155620.2155640 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. DSAGEN: Synthesizing Programmable Spatial Accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1109/ISCA45697.2020.00032 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Max Willsey, Vincent T. Lee, Alvin Cheung, Rastislav Bodík, and Luis Ceze. 2019. Iterative Search for Reconfigurable Accelerator Blocks With a Compiler in the Loop. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2018.2878194 Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Article Metrics

          • Downloads (Last 12 months)95
          • Downloads (Last 6 weeks)95

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!