ABSTRACT
The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy-efficiency of an application running on the CGRA. This paper presents APEX, an automated approach for generating specialized PE architectures for an application or an application domain. APEX first analyzes application domain benchmarks using frequent subgraph mining to extract commonly occurring computational subgraphs. APEX then generates specialized PEs by merging subgraphs using a datapath graph merging algorithm. The merged datapath graphs are translated into a PE specification from which we automatically generate the PE hardware description in Verilog along with a compiler that maps applications to the PE. The PE hardware and compiler are inserted into a flexible CGRA generation and compilation toolchain that allows for agile evaluation of CGRAs. We evaluate APEX for two domains, machine learning and image processing. For image processing applications, our automatically generated CGRAs with specialized PEs achieve from 5% to 30% less area and from 22% to 46% less energy compared to a general-purpose CGRA. For machine learning applications, our automatically generated CGRAs consume 16% to 59% less energy and 22% to 39% less area than a general-purpose CGRA. This work paves the way for creation of application domain-driven design-space exploration frameworks that automatically generate efficient programmable accelerators, with a much lower design effort for both hardware and compiler generation.
- Giovanni Ansaloni, Paolo Bonzini, and Laura Pozzi. 2008. Design and Architectural Exploration of Expression-Grained Reconfigurable Arrays. In 2008 Symposium on Application Specific Processors. https://doi.org/10.1109/SASP.2008.4570782
Google Scholar
Digital Library
- Kubilay Atasu, Laura Pozzi, and Paolo Ienne. 2003. Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints. In Proceedings of the 40th Annual Design Automation Conference (DAC ’03). Association for Computing Machinery, New York, NY, USA. 256–261. isbn:1581136889 https://doi.org/10.1145/775832.775897
Google Scholar
Digital Library
- Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, and Keyi Zhang. 2020. Creating an Agile Hardware Design Flow. In 2020 57th ACM/IEEE Design Automation Conference (DAC). https://doi.org/10.1109/DAC18072.2020.9218553
Google Scholar
Cross Ref
- Thilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2022. REVAMP: A Systematic Framework for Heterogeneous CGRA Realization. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA. isbn:9781450392051 https://doi.org/10.1145/3503222.3507772
Google Scholar
Digital Library
- Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories. In Handbook of Model Checking, Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem (Eds.). Springer International Publishing. isbn:978-3-319-10575-8 https://doi.org/10.1007/978-3-319-10575-8_11
Google Scholar
Cross Ref
- Eli Bendersky. 2013. A Deeper Look into the LLVM Code Generator, Part 1. https://eli.thegreenplace.net/2013/02/25/a-deeper-look-into-the-llvm-code-generator-part-1
Google Scholar
- Robert Brummayer, Armin Biere, and Florian Lonsing. 2008. BTOR: Bit-Precise Modelling of Word-Level Problems for Model Checking. In Proceedings of the Joint Workshops of the 6th International Workshop on Satisfiability Modulo Theories and 1st International Workshop on Bit-Precise Reasoning (SMT ’08/BPR ’08). Association for Computing Machinery, New York, NY, USA. isbn:9781605584409 https://doi.org/10.1145/1512464.1512472
Google Scholar
Digital Library
- Pierre-Yves Calland, Anne Mignotte, Olivier Peyran, Yves Robert, and Frédéric Vivien. 1998. Retiming DAGs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/43.736571
Google Scholar
Digital Library
- Hong Cheng, Xifeng Yan, and Jiawei Han. 2010. Mining Graph Patterns. Springer US, Boston, MA. isbn:978-1-4419-6045-0 https://doi.org/10.1007/978-1-4419-6045-0_12
Google Scholar
Cross Ref
- Jason Cong, Yiping Fan, Guoling Han, and Zhiru Zhang. 2004. Application-Specific Instruction Generation for Configurable Processor Architectures. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA ’04). Association for Computing Machinery, New York, NY, USA. isbn:1581138296 https://doi.org/10.1145/968280.968307
Google Scholar
Digital Library
- Ross Daly, Caleb Donovick, Jackson Melchert, Rajsekhar Setaluri, Nestan Tsiskaridze Bullock, Priyanka Raina, Clark Barrett, and Pat Hanrahan. 2022. Synthesizing Instruction Selection Rewrite Rules from RTL using SMT. In Conference on Formal Methods in Computer-Aided Design (FMCAD). 139–150. https://doi.org/10.34727/2022/isbn.978-3-85448-053-2_20
Google Scholar
Cross Ref
- Ross Daly, Leonard Truong, and Pat Hanrahan. 2018. Invoking and Linking Generators from Multiple Hardware Languages using CoreIR. In Workshop on Open-Source EDA Technology (WOSET). https://woset-workshop.github.io/PDFs/2018/a11.pdf
Google Scholar
- Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proc. VLDB Endow., issn:2150-8097 https://doi.org/10.14778/2732286.2732289
Google Scholar
Digital Library
- Robert. B. Hitchcock, Gordon L. Smith, and David D. Cheng. 1982. Timing Analysis of Computer Hardware. IBM Journal of Research and Development, https://doi.org/10.1147/rd.261.0100
Google Scholar
Digital Library
- Dillon Huff, Steve Dai, and Pat Hanrahan. 2021. Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’21). Association for Computing Machinery, New York, NY, USA. isbn:9781450382182 https://doi.org/10.1145/3431920.3439457
Google Scholar
Digital Library
- Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly, Caleb Donovick, Alex Carsello, Taeyoung Kong, Kathleen Feng, Dillon Huff, Ankita Nayak, Rajsekhar Setaluri, James Thomas, Nikhil Bhagdikar, David Durst, Zachary Myers, Nestan Tsiskaridze, Stephen Richardson, Rick Bahr, Kayvon Fatahalian, Pat Hanrahan, Clark Barrett, Mark Horowitz, Christopher Torng, Fredrik Kjolstad, and Priyanka Raina. 2022. AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers. ACM Transactions on Embedded Computing Systems (TECS), July, issn:1539-9087 https://doi.org/10.1145/3534933
Google Scholar
Digital Library
- Sihao Liu, Jian Weng, Dylan Kupsh, Atefeh Sohrabizadeh, Zhengrong Wang, Licheng Guo, Jiuyang Liu, Maxim Zhulin, Rishabh Mani, Lucheng Zhang, Jason Cong, and Tony Nowatzki. 2022. OverGen: Improving FPGA Usability through Domain-specific Overlay Generation. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO56248.2022.00018
Google Scholar
Cross Ref
- Nahri Moreano, Edson Borin, Cid C. de Souza, and Guido Araujo. 2005. Efficient datapath merging for partially reconfigurable architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2005.850844
Google Scholar
Digital Library
- Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture for Parallel Patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1145/3079856.3080256
Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not., issn:0362-1340 https://doi.org/10.1145/2499370.2462176
Google Scholar
Digital Library
- Edward Rosten and Tom Drummond. 2006. Machine Learning for High-Speed Corner Detection. In Computer Vision – ECCV 2006, Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. isbn:978-3-540-33833-8 https://doi.org/10.1007/11744023_34
Google Scholar
Digital Library
- Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In MICRO. isbn:9781450369381 https://doi.org/10.1145/3352460.3358302
Google Scholar
Digital Library
- Cheng Tan, Chenhao Xie, Ang Li, Kevin J. Barker, and Antonino Tumeo. 2021. AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). https://doi.org/10.23919/DATE51398.2021.9473955
Google Scholar
Cross Ref
- Russell Tessier, Kenneth Pocek, and André DeHon. 2015. Reconfigurable Computing Architectures. Proc. IEEE, https://doi.org/10.1109/JPROC.2014.2386883
Google Scholar
Cross Ref
- Lenny Truong and Pat Hanrahan. 2019. A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity. In 3rd Summit on Advances in Programming Languages, SNAPL 2019, May 16-17, 2019, Providence, RI, USA, Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.) (LIPIcs). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.SNAPL.2019.7
Google Scholar
Cross Ref
- Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Evaluating Programmable Architectures for Imaging and Vision Applications. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO.2016.7783755
Google Scholar
Cross Ref
- Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation Cores: Reducing the Energy of Mature Computations. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery, New York, NY, USA. isbn:9781605588391 https://doi.org/10.1145/1736020.1736044
Google Scholar
Digital Library
- Ganesh Venkatesh, Jack Sampson, Nathan Goulding-Hotta, Sravanthi Kota Venkata, Michael Bedford Taylor, and Steven Swanson. 2011. QsCores: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). Association for Computing Machinery, New York, NY, USA. isbn:9781450310536 https://doi.org/10.1145/2155620.2155640
Google Scholar
Digital Library
- Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. DSAGEN: Synthesizing Programmable Spatial Accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1109/ISCA45697.2020.00032
Google Scholar
Digital Library
- Max Willsey, Vincent T. Lee, Alvin Cheung, Rastislav Bodík, and Luis Ceze. 2019. Iterative Search for Reconfigurable Accelerator Blocks With a Compiler in the Loop. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2018.2878194
Google Scholar
Cross Ref
Index Terms
APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis
Comments