APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

Authors:
Jackson Melchert

Stanford University, USA

Stanford University, USA
View Profile

,
Kathleen Feng

Stanford University, USA

Stanford University, USA
View Profile

,
Caleb Donovick

Stanford University, USA

Stanford University, USA
View Profile

,
Ross Daly

Stanford University, USA

Stanford University, USA
View Profile

,
Ritvik Sharma

Stanford University, USA

Stanford University, USA
View Profile

,
Clark Barrett

Stanford University, USA

Stanford University, USA
View Profile

,
Mark A. Horowitz

Stanford University, USA

Stanford University, USA
View Profile

,
Pat Hanrahan

Stanford University, USA

Stanford University, USA
View Profile

,
Priyanka Raina

Stanford University, USA

Stanford University, USA
View Profile

ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3March 2023 Pages 33–45https://doi.org/10.1145/3582016.3582070

Published:25 March 2023Publication History

ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Pages 33–45

ABSTRACT

The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy-efficiency of an application running on the CGRA. This paper presents APEX, an automated approach for generating specialized PE architectures for an application or an application domain. APEX first analyzes application domain benchmarks using frequent subgraph mining to extract commonly occurring computational subgraphs. APEX then generates specialized PEs by merging subgraphs using a datapath graph merging algorithm. The merged datapath graphs are translated into a PE specification from which we automatically generate the PE hardware description in Verilog along with a compiler that maps applications to the PE. The PE hardware and compiler are inserted into a flexible CGRA generation and compilation toolchain that allows for agile evaluation of CGRAs. We evaluate APEX for two domains, machine learning and image processing. For image processing applications, our automatically generated CGRAs with specialized PEs achieve from 5% to 30% less area and from 22% to 46% less energy compared to a general-purpose CGRA. For machine learning applications, our automatically generated CGRAs consume 16% to 59% less energy and 22% to 39% less area than a general-purpose CGRA. This work paves the way for creation of application domain-driven design-space exploration frameworks that automatically generate efficient programmable accelerators, with a much lower design effort for both hardware and compiler generation.

References

Giovanni Ansaloni, Paolo Bonzini, and Laura Pozzi. 2008. Design and Architectural Exploration of Expression-Grained Reconfigurable Arrays. In 2008 Symposium on Application Specific Processors. https://doi.org/10.1109/SASP.2008.4570782 Google ScholarDigital Library
Kubilay Atasu, Laura Pozzi, and Paolo Ienne. 2003. Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints. In Proceedings of the 40th Annual Design Automation Conference (DAC ’03). Association for Computing Machinery, New York, NY, USA. 256–261. isbn:1581136889 https://doi.org/10.1145/775832.775897 Google ScholarDigital Library
Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, and Keyi Zhang. 2020. Creating an Agile Hardware Design Flow. In 2020 57th ACM/IEEE Design Automation Conference (DAC). https://doi.org/10.1109/DAC18072.2020.9218553 Google ScholarCross Ref
Thilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2022. REVAMP: A Systematic Framework for Heterogeneous CGRA Realization. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA. isbn:9781450392051 https://doi.org/10.1145/3503222.3507772 Google ScholarDigital Library
Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories. In Handbook of Model Checking, Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem (Eds.). Springer International Publishing. isbn:978-3-319-10575-8 https://doi.org/10.1007/978-3-319-10575-8_11 Google ScholarCross Ref
Eli Bendersky. 2013. A Deeper Look into the LLVM Code Generator, Part 1. https://eli.thegreenplace.net/2013/02/25/a-deeper-look-into-the-llvm-code-generator-part-1 Google Scholar
Robert Brummayer, Armin Biere, and Florian Lonsing. 2008. BTOR: Bit-Precise Modelling of Word-Level Problems for Model Checking. In Proceedings of the Joint Workshops of the 6th International Workshop on Satisfiability Modulo Theories and 1st International Workshop on Bit-Precise Reasoning (SMT ’08/BPR ’08). Association for Computing Machinery, New York, NY, USA. isbn:9781605584409 https://doi.org/10.1145/1512464.1512472 Google ScholarDigital Library
Pierre-Yves Calland, Anne Mignotte, Olivier Peyran, Yves Robert, and Frédéric Vivien. 1998. Retiming DAGs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/43.736571 Google ScholarDigital Library
Hong Cheng, Xifeng Yan, and Jiawei Han. 2010. Mining Graph Patterns. Springer US, Boston, MA. isbn:978-1-4419-6045-0 https://doi.org/10.1007/978-1-4419-6045-0_12 Google ScholarCross Ref
Jason Cong, Yiping Fan, Guoling Han, and Zhiru Zhang. 2004. Application-Specific Instruction Generation for Configurable Processor Architectures. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA ’04). Association for Computing Machinery, New York, NY, USA. isbn:1581138296 https://doi.org/10.1145/968280.968307 Google ScholarDigital Library
Ross Daly, Caleb Donovick, Jackson Melchert, Rajsekhar Setaluri, Nestan Tsiskaridze Bullock, Priyanka Raina, Clark Barrett, and Pat Hanrahan. 2022. Synthesizing Instruction Selection Rewrite Rules from RTL using SMT. In Conference on Formal Methods in Computer-Aided Design (FMCAD). 139–150. https://doi.org/10.34727/2022/isbn.978-3-85448-053-2_20 Google ScholarCross Ref
Ross Daly, Leonard Truong, and Pat Hanrahan. 2018. Invoking and Linking Generators from Multiple Hardware Languages using CoreIR. In Workshop on Open-Source EDA Technology (WOSET). https://woset-workshop.github.io/PDFs/2018/a11.pdf Google Scholar
Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proc. VLDB Endow., issn:2150-8097 https://doi.org/10.14778/2732286.2732289 Google ScholarDigital Library
Robert. B. Hitchcock, Gordon L. Smith, and David D. Cheng. 1982. Timing Analysis of Computer Hardware. IBM Journal of Research and Development, https://doi.org/10.1147/rd.261.0100 Google ScholarDigital Library
Dillon Huff, Steve Dai, and Pat Hanrahan. 2021. Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’21). Association for Computing Machinery, New York, NY, USA. isbn:9781450382182 https://doi.org/10.1145/3431920.3439457 Google ScholarDigital Library
Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly, Caleb Donovick, Alex Carsello, Taeyoung Kong, Kathleen Feng, Dillon Huff, Ankita Nayak, Rajsekhar Setaluri, James Thomas, Nikhil Bhagdikar, David Durst, Zachary Myers, Nestan Tsiskaridze, Stephen Richardson, Rick Bahr, Kayvon Fatahalian, Pat Hanrahan, Clark Barrett, Mark Horowitz, Christopher Torng, Fredrik Kjolstad, and Priyanka Raina. 2022. AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers. ACM Transactions on Embedded Computing Systems (TECS), July, issn:1539-9087 https://doi.org/10.1145/3534933 Google ScholarDigital Library
Sihao Liu, Jian Weng, Dylan Kupsh, Atefeh Sohrabizadeh, Zhengrong Wang, Licheng Guo, Jiuyang Liu, Maxim Zhulin, Rishabh Mani, Lucheng Zhang, Jason Cong, and Tony Nowatzki. 2022. OverGen: Improving FPGA Usability through Domain-specific Overlay Generation. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO56248.2022.00018 Google ScholarCross Ref
Nahri Moreano, Edson Borin, Cid C. de Souza, and Guido Araujo. 2005. Efficient datapath merging for partially reconfigurable architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2005.850844 Google ScholarDigital Library
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture for Parallel Patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1145/3079856.3080256 Google ScholarDigital Library
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not., issn:0362-1340 https://doi.org/10.1145/2499370.2462176 Google ScholarDigital Library
Edward Rosten and Tom Drummond. 2006. Machine Learning for High-Speed Corner Detection. In Computer Vision – ECCV 2006, Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. isbn:978-3-540-33833-8 https://doi.org/10.1007/11744023_34 Google ScholarDigital Library
Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In MICRO. isbn:9781450369381 https://doi.org/10.1145/3352460.3358302 Google ScholarDigital Library
Cheng Tan, Chenhao Xie, Ang Li, Kevin J. Barker, and Antonino Tumeo. 2021. AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). https://doi.org/10.23919/DATE51398.2021.9473955 Google ScholarCross Ref
Russell Tessier, Kenneth Pocek, and André DeHon. 2015. Reconfigurable Computing Architectures. Proc. IEEE, https://doi.org/10.1109/JPROC.2014.2386883 Google ScholarCross Ref
Lenny Truong and Pat Hanrahan. 2019. A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity. In 3rd Summit on Advances in Programming Languages, SNAPL 2019, May 16-17, 2019, Providence, RI, USA, Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.) (LIPIcs). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.SNAPL.2019.7 Google ScholarCross Ref
Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Evaluating Programmable Architectures for Imaging and Vision Applications. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO.2016.7783755 Google ScholarCross Ref
Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation Cores: Reducing the Energy of Mature Computations. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery, New York, NY, USA. isbn:9781605588391 https://doi.org/10.1145/1736020.1736044 Google ScholarDigital Library
Ganesh Venkatesh, Jack Sampson, Nathan Goulding-Hotta, Sravanthi Kota Venkata, Michael Bedford Taylor, and Steven Swanson. 2011. QsCores: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). Association for Computing Machinery, New York, NY, USA. isbn:9781450310536 https://doi.org/10.1145/2155620.2155640 Google ScholarDigital Library
Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. DSAGEN: Synthesizing Programmable Spatial Accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1109/ISCA45697.2020.00032 Google ScholarDigital Library
Max Willsey, Vincent T. Lee, Alvin Cheung, Rastislav Bodík, and Luis Ceze. 2019. Iterative Search for Reconfigurable Accelerator Blocks With a Compiler in the Loop. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2018.2878194 Google ScholarCross Ref

Index Terms

APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
March 2023
820 pages
ISBN:9781450399180
DOI:10.1145/3582016
General Chair:
Tor M. Aamodt
University of British Columbia, Canada
,
Program Chairs:
Natalie Enright Jerger
University of Toronto, Canada
,
Michael Swift
University of Wisconsin-Madison, USA
Copyright © 2023 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
processing elements
reconfigurable accelerators
subgraph
CGRA
graph analysis
design space exploration
hardware-software co-design
domain-specific accelerators
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate 535 of 2,713 submissions, 20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 95
  Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)95
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

Save to Binder

ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

ABSTRACT

References

Cited By

Index Terms

APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis