research-article

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Authors:
Kalhan Koul

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-6123-9064
Search about this author

,
Jackson Melchert

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-8232-1603
View Profile

,
Kavya Sreedhar

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-8456-6313
View Profile

,
Leonard Truong

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-7583-9730
View Profile

,
Gedeon Nyengele

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-5028-7252
View Profile

,
Keyi Zhang

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-8902-2518
View Profile

,
Qiaoyi Liu

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0003-1083-9953
View Profile

,
Jeff Setter

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-2327-646X
View Profile

,
Po-Han Chen

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-9760-9565
Search about this author

,
Yuchen Mei

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-9459-5994
Search about this author

,
Maxwell Strange

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-5945-1349
View Profile

,
Ross Daly

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-4938-5250
View Profile

,
Caleb Donovick

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-9336-1267
View Profile

,
Alex Carsello

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0003-2549-9525
View Profile

,
Taeyoung Kong

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-6224-4690
View Profile

,
Kathleen Feng

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-9860-4942
View Profile

,
Dillon Huff

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-9055-3490
View Profile

,
Ankita Nayak

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-7821-0460
View Profile

,
Rajsekhar Setaluri

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0003-2078-0991
View Profile

,
James Thomas

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-6823-3685
View Profile

,
Nikhil Bhagdikar

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-5141-3441
View Profile

,
David Durst

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-4960-0336
View Profile

,
Zachary Myers

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-4807-9550
View Profile

,
Nestan Tsiskaridze

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-4729-9770
View Profile

,
Stephen Richardson

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0003-4359-3638
View Profile

,
Rick Bahr

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0003-3323-7752
View Profile

,
Kayvon Fatahalian

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0001-8754-0429
View Profile

,
Pat Hanrahan

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-3474-9752
View Profile

,
Clark Barrett

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-9522-3084
View Profile

,
Mark Horowitz

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0003-3245-7542
View Profile

,
Christopher Torng

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-2385-619X
View Profile

,
Fredrik Kjolstad

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-2267-903X
View Profile

,
Priyanka Raina

Stanford University, Stanford, California, USA

Stanford University, Stanford, California, USA

0000-0002-8834-8663
View Profile

ACM Transactions on Embedded Computing Systems Volume 22 Issue 2March 2023 Article No.: 35pp 1–34https://doi.org/10.1145/3534933

Published:24 January 2023Publication History

ACM Transactions on Embedded Computing Systems

Abstract

With the slowing of Moore’s law, computer architects have turned to domain-specific hardware specialization to continue improving the performance and efficiency of computing systems. However, specialization typically entails significant modifications to the software stack to properly leverage the updated hardware. The lack of a structured approach for updating the compiler and the accelerator in tandem has impeded many attempts to systematize this procedure. We propose a new approach to enable flexible and evolvable domain-specific hardware specialization based on coarse-grained reconfigurable arrays (CGRAs). Our agile methodology employs a combination of new programming languages and formal methods to automatically generate the accelerator hardware and its compiler from a single source of truth. This enables the creation of design-space exploration frameworks that automatically generate accelerator architectures that approach the efficiencies of hand-designed accelerators, with a significantly lower design effort for both hardware and compiler generation. Our current system accelerates dense linear algebra applications but is modular and can be extended to support other domains. Our methodology has the potential to significantly improve the productivity of hardware-software engineering teams and enable quicker customization and deployment of complex accelerator-rich computing systems.

REFERENCES

[1] Amid Alon, Biancolin David, Gonzalez Abraham, Grubb Daniel, Karandikar Sagar, Liew Harrison, Magyar Albert, Mao Howard, Ou Albert, Pemberton Nathan, Rigge Paul, Schmidt Colin, Wright John, Zhao Jerry, Shao Yakun Sophia, Asanović Krste, and Nikolić Borivoje. 2020. Chipyard: Integrated design, simulation, and implementation framework for custom SoCs. IEEE Micro 40, 4 (2020), 10–21. Google ScholarDigital Library
[2] Bahr Rick, Barrett Clark, Bhagdikar Nikhil, Carsello Alex, Daly Ross, Donovick Caleb, Durst David, Fatahalian Kayvon, Feng Kathleen, Hanrahan Pat, Hofstee Teguh, Horowitz Mark, Huff Dillon, Kjolstad Fredrik, Kong Taeyoung, Liu Qiaoyi, Mann Makai, Melchert Jackson, Nayak Ankita, Niemetz Aina, Nyengele Gedeon, Raina Priyanka, Richardson Stephen, Setaluri Raj, Setter Jeff, Sreedhar Kavya, Strange Maxwell, Thomas James, Torng Christopher, Truong Leonard, Tsiskaridze Nestan, and Zhang Keyi. 2020. Creating an agile hardware design flow. In 57th ACM/IEEE Design Automation Conference (DAC’20). 1–6. Google ScholarCross Ref
[3] Barrett Clark, Fontaine Pascal, and Tinelli Cesare. 2016. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org.Google Scholar
[4] Canis Andrew, Choi Jongsok, Aldham Mark, Zhang Victor, Kammoona Ahmed, Anderson Jason H., Brown Stephen, and Czajkowski Tomasz. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, 33–36. Google ScholarDigital Library
[5] Carsello Alex, Feng Kathleen, Kong Taeyoung, Koul Kalhan, Liu Qiaoyi, Melchert Jackson, Nyengele Gedeon, Strange Maxwell, Zhang Keyi, Nayak Ankita, Setter Jeff, Thomas James, Sreedhar Kavya, Chen Po-Han, Bhagdikar Nikhil, Myers Zachary, D’Agostino Brandon, Joshi Pranil, Richardson Stephen, Bahr Rick, Torng Christopher, Horowitz Mark, and Raina Priyanka. 2022. Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a coarse-grained reconfigurable array for flexible acceleration of dense linear algebra. In 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). 70–71. Google ScholarCross Ref
[6] Carsello Alex, Thomas James, Nayak Ankita, Chen Po-Han, Horowitz Mark, Raina Priyanka, and Torng Christopher. 2022. mflowgen: A modular flow generator and ecosystem for community-driven physical design. In Design Automation Conference (DAC).Google Scholar
[7] Chen Yu-Chen, Chen Sheng-Yen, and Chang Yao-Wen. 2014. Efficient and effective packing and analytical placement for large-scale heterogeneous FPGAs. In 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). IEEE, 647–654. Google ScholarCross Ref
[8] Chi Yuze, Cong Jason, Wei Peng, and Zhou Peipei. 2018. SODA: Stencil with optimized dataflow architecture. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 1–8. Google ScholarDigital Library
[9] Chin S. Alexander, Sakamoto Noriaki, Rui Allan, Zhao Jim, Kim Jin Hee, Hara-Azumi Yuko, and Anderson Jason. 2017. CGRA-ME: A unified framework for CGRA modelling and exploration. In IEEE 28th International Conference on Application-Specific Systems, Architectures and Processors (ASAP’17). 184–189. Google ScholarCross Ref
[10] Chugh Nitin, Vasista Vinay, Purini Suresh, and Bondhugula Uday. 2016. A DSL compiler for accelerating image processing pipelines on FPGAs. In Proceedings of the International Conference on Parallel Architectures and Compilation. 327–338. Google ScholarDigital Library
[11] Cooper Keith D., Simpson L. Taylor, and Vick Christopher A.. 2001. Operator strength reduction. ACM Transactions on Programming Languages and Systems 23, 5 (2001), 603–625.Google ScholarDigital Library
[12] Daly Ross, Truong Leonard, and Hanrahan Pat. 2018. Invoking and linking generators from multiple hardware languages using CoreIR. In Workshop on Open-Source EDA Technology (WOSET’18). https://woset-workshop.github.io/PDFs/2018/a11.pdf.Google Scholar
[13] Degila Jules R. and Sanso Brunilde. 2004. A survey of topologies and performance measures for large-scale networks. IEEE Communications Surveys Tutorials 6, 4 (2004), 18–31. Google ScholarDigital Library
[14] Durst David, Feldman Matthew, Huff Dillon, Akeley David, Daly Ross, Bernstein Gilbert Louis, Patrignani Marco, Fatahalian Kayvon, and Hanrahan Pat. 2020. Type-directed scheduling of streaming accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’20). 408–422. Google ScholarDigital Library
[15] Govindaraju Venkatraman, Ho Chen-Han, Nowatzki Tony, Chhugani Jatin, Satish Nadathur, Sankaralingam Karthikeyan, and Kim Changkyu. 2012. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32, 5 (2012), 38–51. Google ScholarDigital Library
[16] Hegarty James, Brunhaver John, DeVito Zachary, Ragan-Kelley Jonathan, Cohen Noy, Bell Steven, Vasilyev Artem, Horowitz Mark, and Hanrahan Pat. 2014. Darkroom: Compiling high-level image processing code into hardware pipelines. ACM Transactions on Graphics 33, 4, Article 144 (July2014), 11 pages. Google ScholarDigital Library
[17] Hegarty James, Daly Ross, DeVito Zachary, Ragan-Kelley Jonathan, Horowitz Mark, and Hanrahan Pat. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics 35, 4, Article 85 (2016), 11 pages. Google ScholarDigital Library
[18] Hinze Ralf. 2004. An algebra of scans. In International Conference on Mathematics of Program Construction. Springer, 186–210.Google ScholarCross Ref
[19] Huff Dillon, Dai Steve, and Hanrahan Pat. 2021. Clockwork: Resource-efficient static scheduling for multi-rate image processing applications on FPGAs. In IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’21). 186–194. Google ScholarCross Ref
Inc. Intel [n. d.]. Altera OpenCL. https://www.intel.com/content/www/us/en/software/programmable/sdk-for-opencl/overview.html.Google Scholar
[21] Jia Yangqing, Shelhamer Evan, Donahue Jeff, Karayev Sergey, Long Jonathan, Girshick Ross, Guadarrama Sergio, and Darrell Trevor. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, FL) (MM’14). ACM, New York, NY, 675–678. Google ScholarDigital Library
[22] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, Boyle Rick, Cantin Pierre-luc, Chao Clifford, Clark Chris, Coriell Jeremy, Daley Mike, Dau Matt, Dean Jeffrey, Gelb Ben, Ghaemmaghami Tara Vazir, Gottipati Rajendra, Gulland William, Hagmann Robert, Ho C. Richard, Hogberg Doug, Hu John, Hundt Robert, Hurt Dan, Ibarz Julian, Jaffey Aaron, Jaworski Alek, Kaplan Alexander, Khaitan Harshit, Killebrew Daniel, Koch Andy, Kumar Naveen, Lacy Steve, Laudon James, Law James, Le Diemthu, Leary Chris, Liu Zhuyuan, Lucke Kyle, Lundin Alan, MacKean Gordon, Maggiore Adriana, Mahony Maire, Miller Kieran, Nagarajan Rahul, Narayanaswami Ravi, Ni Ray, Nix Kathy, Norrie Thomas, Omernick Mark, Penukonda Narayana, Phelps Andy, Ross Jonathan, Ross Matt, Salek Amir, Samadiani Emad, Severn Chris, Sizikov Gregory, Snelham Matthew, Souter Jed, Steinberg Dan, Swing Andy, Tan Mercedes, Thorson Gregory, Tian Bo, Toma Horia, Tuttle Erick, Vasudevan Vijay, Walter Richard, Wang Walter, Wilcox Eric, and Yoon Doe Hyun. 2017. In-datacenter performance analysis of a tensor processing unit. SIGARCH Comput. Archit. News 45, 2 (June2017), 1–12. Google ScholarDigital Library
[23] Kahng Andrew B., Reda Sherief, and Wang Qinke. 2005. APlace: A general analytic placement framework. In Proceedings of the 2005 International Symposium on Physical Design (San Francisco, CA) (ISPD’05). ACM, New York, NY, 233–235. Google ScholarDigital Library
Group Khronos® OpenCL Working. [n. d.]. The OpenCL™ C Specification. Retrieved July 13, 2022 from https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_C.pdf.Google Scholar
[25] Kjolstad Fredrik, Ahrens Peter, Kamil Shoaib, and Amarasinghe Saman. 2019. Tensor algebra compilation with workspaces. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19). 180–192. Google ScholarCross Ref
[26] Kjolstad Fredrik, Kamil Shoaib, Chou Stephen, Lugato David, and Amarasinghe Saman. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 77 (Oct2017), 29 pages. Google ScholarDigital Library
[27] Koeplinger David, Feldman Matthew, Prabhakar Raghu, Zhang Yaqi, Hadjis Stefan, Fiszel Ruben, Zhao Tian, Nardi Luigi, Pedram Ardavan, Kozyrakis Christos, and Olukotun Kunle. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA) (PLDI’18). ACM, New York, NY, 296–311. Google ScholarDigital Library
[28] Kwon Hyoukjun, Samajdar Ananda, and Krishna Tushar. 2018. MAERI: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (March2018), 461–475. Google ScholarDigital Library
[29] Lai Yi-Hsiang, Chi Yuze, Hu Yuwei, Wang Jie, Yu Cody Hao, Zhou Yuan, Cong Jason, and Zhang Zhiru. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19) (Seaside, CA). 242–251. Google ScholarDigital Library
[30] Li Jiajie, Chi Yuze, and Cong Jason. 2020. HeteroHalide: From image processing DSL to efficient FPGA acceleration. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA) (FPGA’20). ACM, New York, NY, 51–57. Google ScholarDigital Library
[31] Liu Qiaoyi, Huff Dillon, Setter Jeff, Strange Maxwell, Feng Kathleen, Sreedhar Kavya, Wang Ziheng, Zhang Keyi, Horowitz Mark, Raina Priyanka, et al. 2021. Compiling halide programs to push-memory accelerators. arXiv preprint arXiv:2105.12858 (2021).Google Scholar
[32] Masud Muhammad. 2000. FPGA Routing Structures: A Novel Switch Block and Depopulated Interconnect Matrix Architectures. Ph. D. Dissertation. University of British Columbia. https://people.ece.ubc.ca/stevew/papers/pdf/imran_masc.pdf.Google Scholar
Inc. Maxeler [n. d.]. MaxCompiler. Retrieved July 13, 2022 from https://www.maxeler.com/products/software/maxcompiler.Google Scholar
[34] Meeus Wim, Beeck Kristof Van, Goedemé Toon, Meel Jan, and Stroobandt Dirk. 2012. An overview of today’s high-level synthesis tools. Design Automation for Embedded Systems 16, 3 (2012), 31–51. Google ScholarDigital Library
[35] Mei Baisha, Berekovic Mladen, and Mignolet J.-Y.. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. Springer. Google ScholarCross Ref
Inc. Mentor Graphics [n. d.]. Catapult High Level Synthesis. Retrieved July 13, 2022 from https://www.mentor.com/hls-lp/catapult-high-level-synthesis.Google Scholar
[37] Moreau Thierry, Chen Tianqi, and Ceze Luis. 2018. Leveraging the VTA-TVM hardware-software stack for FPGA acceleration of 8-bit ResNet-18 inference. In Proceedings of the Reproducible Quality-Efficient Systems Tournament on Co-Designing Pareto-Efficient Deep Learning (ReQuEST) (Williamsburg, VA). Article 5. Google ScholarDigital Library
[38] Moreau Thierry, Chen Tianqi, Jiang Ziheng, Ceze Luis, Guestrin Carlos, and Krishnamurthy Arvind. 2018. VTA: An open hardware-software stack for deep learning. arXiv preprint arXiv:1807.04188 (2018).Google Scholar
[39] Mullapudi Ravi, Adams Andrew, Sharlet Dillon, Ragan-Kelley Jonathan, and Fatahalian Kayvon. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics 35 (72016), 1–11. Google ScholarDigital Library
[40] Nayak Ankita, Zhang Keyi, Setaluri Raj, Carsello Alex, Mann Makai, Richardson Stephen, Bahr Rick, Hanrahan Pat, Horowitz Mark, and Raina Priyanka. 2020. A framework for adding low-overhead, fine-grained power domains to CGRAs. In 2020 Design Automation Test in Europe Conference Exhibition (DATE’20). 846–851. Google ScholarCross Ref
[41] O’Donnell John. 1988. Hydra: Hardware description in a functional language using recursion equations and high order combining forms. The Fusion of Hardware Design and Verification (1988), 309–328.Google Scholar
[42] Prabhakar Raghu, Zhang Yaqi, Koeplinger David, Feldman Matt, Zhao Tian, Hadjis Stefan, Pedram Ardavan, Kozyrakis Christos, and Olukotun Kunle. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA’17). 389–402. Google ScholarDigital Library
[43] Pu Jing, Bell Steven, Yang Xuan, Setter Jeff, Richardson Stephen, Ragan-Kelley Jonathan, and Horowitz Mark. 2017. Programming heterogeneous systems from an image processing DSL. ACMTransactions on Architecture and Code Optimization 14, 3, Article 26 (Aug.2017), 25 pages. Google ScholarDigital Library
[44] Ragan-Kelley Jonathan, Barnes Connelly, Adams Andrew, Paris Sylvain, Durand Frédo, and Amarasinghe Saman. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, WA) (PLDI’13). ACM, New York, NY, 519–530. Google ScholarDigital Library
[45] Reiche Oliver, Schmid Moritz, Hannig Frank, Membarth Richard, and Teich Jürgen. 2014. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’14). 1–10. Google ScholarDigital Library
[46] Sharma Hardik, Park Jongse, Mahajan Divya, Amaro Emmanuel, Kim Joon Kyung, Shao Chenkai, Mishra Asit, and Esmaeilzadeh Hadi. 2016. From high-level deep neural models to FPGAs. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1–12. Google ScholarCross Ref
[47] Swartz Jordan S., Betz Vaughn, and Rose Jonathan. 1998. A fast routability-driven router for FPGAs. In Proceedings of the ACM/SIGDA 6th International Symposium on Field Programmable Gate Arrays (Monterey, CA) (FPGA’98). ACM, New York, NY, 140–149. Google ScholarDigital Library
[48] Torng Christopher, Pan Peitian, Ou Yanghui, Tan Cheng, and Batten Christopher. 2021. Ultra-elastic CGRAs for irregular loop specialization. In IEEE International Symposium on High-Performance Computer Architecture (HPCA’21). 412–425. Google ScholarCross Ref
[49] Truong Lenny and Hanrahan Pat. 2019. A golden age of hardware description languages: Applying programming language techniques to improve design productivity. In 3rd Summit on Advances in Programming Languages (SNAPL’19)(Leibniz International Proceedings in Informatics (LIPIcs), Vol. 136), Lerner Benjamin S., Bodík Rastislav, and Krishnamurthi Shriram (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 7:1–7:21. Google ScholarCross Ref
[50] Truong Lenny, Herbst Steven, Setaluri Rajsekhar, Mann Makai, Daly Ross, Zhang Keyi, Donovick Caleb, Stanley Daniel, Horowitz Mark, Barrett Clark, and Hanrahan Pat. 2020. fault: A python embedded domain-specific language for metaprogramming portable hardware verification components. In Computer Aided Verification. Springer International Publishing, 403–414. Google ScholarDigital Library
[51] Tsiskaridze Nestan, Strange Maxwell, Mann Makai, Sreedhar Kavya, Liu Qiaoyi, Horowitz Mark, and Barrett Clark. 2021. Automating system configuration. In 2021 Formal Methods in Computer Aided Design (FMCAD). Google ScholarCross Ref
[52] Laarhoven Peter J. M. Van and Aarts Emile H. L.. 1987. Simulated annealing. In Simulated Annealing: Theory and Applications. Springer, 7–15.Google Scholar
[53] Vasilyev Artem, Bhagdikar Nikhil, Pedram Ardavan, Richardson Stephen, Kvatinsky Shahar, and Horowitz Mark. 2016. Evaluating programmable architectures for imaging and vision applications. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1–13. Google ScholarCross Ref
[54] Venkatesan Rangharajan, Shao Yakun Sophia, Wang Miaorong, Clemons Jason, Dai Steve, Fojtik Matthew, Keller Ben, Klinefelter Alicia, Pinckney Nathaniel, Raina Priyanka, Zhang Yanqing, Zimmer Brian, Dally William J., Emer Joel, Keckler Stephen W., and Khailany Brucek. 2019. MAGNet: A modular accelerator generator for neural networks. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). 1–8. Google ScholarCross Ref
Veripool. [n. d.]. Verilator. Retrieved July 13, 2022 from https://www.veripool.org/verilator/.Google Scholar
[56] Wang Renda, Guo Longjiang, Ai Chunyu, Li Jinbao, Ren Meirui, and Li Keqin. 2013. An efficient graph isomorphism algorithm based on canonical labeling and its parallel implementation on GPU. In IEEE 10th International Conference on High Performance Computing and Communications IEEE International Conference on Embedded and Ubiquitous Computing. 1089–1096. Google ScholarCross Ref
Inc. Xilinx [n. d.]. Vivado High Level Synthesis. Retrieved July 13, 2022 from https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.Google Scholar
[58] Xu Pengfei, Zhang Xiaofan, Hao Cong, Zhao Yang, Zhang Yongan, Wang Yue, Li Chaojian, Guan Zetong, Chen Deming, and Lin Yingyan. 2020. AutoDNNchip: An automated DNN chip predictor and builder for both FPGAs and ASICs. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA) (FPGA’20). ACM, New York, NY, 40–50. Google ScholarDigital Library
[59] Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao Bingjun, and Cong Jason. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CA) (FPGA’15). ACM, New York, NY, 161–170. Google ScholarDigital Library
[60] Zhang Xiaofan, Wang Junsong, Zhu Chao, Lin Yonghua, Xiong Jinjun, Hwu Wen-mei, and Chen Deming. 2018. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD’18) (San Diego, CA). Article 56, 8 pages. Google ScholarDigital Library
[61] Zuo Wei, Liang Yun, Li Peng, Rupnow Kyle, Chen Deming, and Cong Jason. 2013. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, CA) (FPGA’13). ACM, New York, NY, 9–18. Google ScholarDigital Library

Index Terms

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 22, Issue 2
March 2023
560 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3572826
Editor:
Tulika Mitra
National University of Singapore, Singapore
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 January 2023
- Online AM: 7 July 2022
- Accepted: 30 April 2022
- Revised: 12 March 2022
- Received: 18 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
coarse-grained reconfigurable arrays
Hardware accelerators
image processing
domain-specific languages
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 1,312
  Total Downloads
- Downloads (Last 12 months)1,312
- Downloads (Last 6 weeks)289
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Save to Binder

ACM Transactions on Embedded Computing Systems

Abstract

REFERENCES

Cited By

Index Terms

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Save to Binder

ACM Transactions on Embedded Computing Systems

Abstract

REFERENCES

Cited By

Index Terms

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media