research-article

AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

Authors Info & Claims
Published:24 January 2023Publication History
Skip Abstract Section

Abstract

With the slowing of Moore’s law, computer architects have turned to domain-specific hardware specialization to continue improving the performance and efficiency of computing systems. However, specialization typically entails significant modifications to the software stack to properly leverage the updated hardware. The lack of a structured approach for updating the compiler and the accelerator in tandem has impeded many attempts to systematize this procedure. We propose a new approach to enable flexible and evolvable domain-specific hardware specialization based on coarse-grained reconfigurable arrays (CGRAs). Our agile methodology employs a combination of new programming languages and formal methods to automatically generate the accelerator hardware and its compiler from a single source of truth. This enables the creation of design-space exploration frameworks that automatically generate accelerator architectures that approach the efficiencies of hand-designed accelerators, with a significantly lower design effort for both hardware and compiler generation. Our current system accelerates dense linear algebra applications but is modular and can be extended to support other domains. Our methodology has the potential to significantly improve the productivity of hardware-software engineering teams and enable quicker customization and deployment of complex accelerator-rich computing systems.

REFERENCES

  1. [1] Amid Alon, Biancolin David, Gonzalez Abraham, Grubb Daniel, Karandikar Sagar, Liew Harrison, Magyar Albert, Mao Howard, Ou Albert, Pemberton Nathan, Rigge Paul, Schmidt Colin, Wright John, Zhao Jerry, Shao Yakun Sophia, Asanović Krste, and Nikolić Borivoje. 2020. Chipyard: Integrated design, simulation, and implementation framework for custom SoCs. IEEE Micro 40, 4 (2020), 1021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Bahr Rick, Barrett Clark, Bhagdikar Nikhil, Carsello Alex, Daly Ross, Donovick Caleb, Durst David, Fatahalian Kayvon, Feng Kathleen, Hanrahan Pat, Hofstee Teguh, Horowitz Mark, Huff Dillon, Kjolstad Fredrik, Kong Taeyoung, Liu Qiaoyi, Mann Makai, Melchert Jackson, Nayak Ankita, Niemetz Aina, Nyengele Gedeon, Raina Priyanka, Richardson Stephen, Setaluri Raj, Setter Jeff, Sreedhar Kavya, Strange Maxwell, Thomas James, Torng Christopher, Truong Leonard, Tsiskaridze Nestan, and Zhang Keyi. 2020. Creating an agile hardware design flow. In 57th ACM/IEEE Design Automation Conference (DAC’20). 16. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Barrett Clark, Fontaine Pascal, and Tinelli Cesare. 2016. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org.Google ScholarGoogle Scholar
  4. [4] Canis Andrew, Choi Jongsok, Aldham Mark, Zhang Victor, Kammoona Ahmed, Anderson Jason H., Brown Stephen, and Czajkowski Tomasz. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, 3336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Carsello Alex, Feng Kathleen, Kong Taeyoung, Koul Kalhan, Liu Qiaoyi, Melchert Jackson, Nyengele Gedeon, Strange Maxwell, Zhang Keyi, Nayak Ankita, Setter Jeff, Thomas James, Sreedhar Kavya, Chen Po-Han, Bhagdikar Nikhil, Myers Zachary, D’Agostino Brandon, Joshi Pranil, Richardson Stephen, Bahr Rick, Torng Christopher, Horowitz Mark, and Raina Priyanka. 2022. Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a coarse-grained reconfigurable array for flexible acceleration of dense linear algebra. In 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). 7071. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Carsello Alex, Thomas James, Nayak Ankita, Chen Po-Han, Horowitz Mark, Raina Priyanka, and Torng Christopher. 2022. mflowgen: A modular flow generator and ecosystem for community-driven physical design. In Design Automation Conference (DAC).Google ScholarGoogle Scholar
  7. [7] Chen Yu-Chen, Chen Sheng-Yen, and Chang Yao-Wen. 2014. Efficient and effective packing and analytical placement for large-scale heterogeneous FPGAs. In 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). IEEE, 647654. Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chi Yuze, Cong Jason, Wei Peng, and Zhou Peipei. 2018. SODA: Stencil with optimized dataflow architecture. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chin S. Alexander, Sakamoto Noriaki, Rui Allan, Zhao Jim, Kim Jin Hee, Hara-Azumi Yuko, and Anderson Jason. 2017. CGRA-ME: A unified framework for CGRA modelling and exploration. In IEEE 28th International Conference on Application-Specific Systems, Architectures and Processors (ASAP’17). 184189. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chugh Nitin, Vasista Vinay, Purini Suresh, and Bondhugula Uday. 2016. A DSL compiler for accelerating image processing pipelines on FPGAs. In Proceedings of the International Conference on Parallel Architectures and Compilation. 327338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Cooper Keith D., Simpson L. Taylor, and Vick Christopher A.. 2001. Operator strength reduction. ACM Transactions on Programming Languages and Systems 23, 5 (2001), 603625.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Daly Ross, Truong Leonard, and Hanrahan Pat. 2018. Invoking and linking generators from multiple hardware languages using CoreIR. In Workshop on Open-Source EDA Technology (WOSET’18). https://woset-workshop.github.io/PDFs/2018/a11.pdf.Google ScholarGoogle Scholar
  13. [13] Degila Jules R. and Sanso Brunilde. 2004. A survey of topologies and performance measures for large-scale networks. IEEE Communications Surveys Tutorials 6, 4 (2004), 1831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Durst David, Feldman Matthew, Huff Dillon, Akeley David, Daly Ross, Bernstein Gilbert Louis, Patrignani Marco, Fatahalian Kayvon, and Hanrahan Pat. 2020. Type-directed scheduling of streaming accelerators. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’20). 408422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Govindaraju Venkatraman, Ho Chen-Han, Nowatzki Tony, Chhugani Jatin, Satish Nadathur, Sankaralingam Karthikeyan, and Kim Changkyu. 2012. DySER: Unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32, 5 (2012), 3851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Hegarty James, Brunhaver John, DeVito Zachary, Ragan-Kelley Jonathan, Cohen Noy, Bell Steven, Vasilyev Artem, Horowitz Mark, and Hanrahan Pat. 2014. Darkroom: Compiling high-level image processing code into hardware pipelines. ACM Transactions on Graphics 33, 4, Article 144 (July2014), 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Hegarty James, Daly Ross, DeVito Zachary, Ragan-Kelley Jonathan, Horowitz Mark, and Hanrahan Pat. 2016. Rigel: Flexible multi-rate image processing hardware. ACM Transactions on Graphics 35, 4, Article 85 (2016), 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Hinze Ralf. 2004. An algebra of scans. In International Conference on Mathematics of Program Construction. Springer, 186210.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Huff Dillon, Dai Steve, and Hanrahan Pat. 2021. Clockwork: Resource-efficient static scheduling for multi-rate image processing applications on FPGAs. In IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’21). 186194. Google ScholarGoogle ScholarCross RefCross Ref
  20. Inc. Intel [n. d.]. Altera OpenCL. https://www.intel.com/content/www/us/en/software/programmable/sdk-for-opencl/overview.html.Google ScholarGoogle Scholar
  21. [21] Jia Yangqing, Shelhamer Evan, Donahue Jeff, Karayev Sergey, Long Jonathan, Girshick Ross, Guadarrama Sergio, and Darrell Trevor. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, FL) (MM’14). ACM, New York, NY, 675678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, Boyle Rick, Cantin Pierre-luc, Chao Clifford, Clark Chris, Coriell Jeremy, Daley Mike, Dau Matt, Dean Jeffrey, Gelb Ben, Ghaemmaghami Tara Vazir, Gottipati Rajendra, Gulland William, Hagmann Robert, Ho C. Richard, Hogberg Doug, Hu John, Hundt Robert, Hurt Dan, Ibarz Julian, Jaffey Aaron, Jaworski Alek, Kaplan Alexander, Khaitan Harshit, Killebrew Daniel, Koch Andy, Kumar Naveen, Lacy Steve, Laudon James, Law James, Le Diemthu, Leary Chris, Liu Zhuyuan, Lucke Kyle, Lundin Alan, MacKean Gordon, Maggiore Adriana, Mahony Maire, Miller Kieran, Nagarajan Rahul, Narayanaswami Ravi, Ni Ray, Nix Kathy, Norrie Thomas, Omernick Mark, Penukonda Narayana, Phelps Andy, Ross Jonathan, Ross Matt, Salek Amir, Samadiani Emad, Severn Chris, Sizikov Gregory, Snelham Matthew, Souter Jed, Steinberg Dan, Swing Andy, Tan Mercedes, Thorson Gregory, Tian Bo, Toma Horia, Tuttle Erick, Vasudevan Vijay, Walter Richard, Wang Walter, Wilcox Eric, and Yoon Doe Hyun. 2017. In-datacenter performance analysis of a tensor processing unit. SIGARCH Comput. Archit. News 45, 2 (June2017), 112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Kahng Andrew B., Reda Sherief, and Wang Qinke. 2005. APlace: A general analytic placement framework. In Proceedings of the 2005 International Symposium on Physical Design (San Francisco, CA) (ISPD’05). ACM, New York, NY, 233235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Group Khronos® OpenCL Working. [n. d.]. The OpenCL™ C Specification. Retrieved July 13, 2022 from https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_C.pdf.Google ScholarGoogle Scholar
  25. [25] Kjolstad Fredrik, Ahrens Peter, Kamil Shoaib, and Amarasinghe Saman. 2019. Tensor algebra compilation with workspaces. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO’19). 180192. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kjolstad Fredrik, Kamil Shoaib, Chou Stephen, Lugato David, and Amarasinghe Saman. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 77 (Oct2017), 29 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Koeplinger David, Feldman Matthew, Prabhakar Raghu, Zhang Yaqi, Hadjis Stefan, Fiszel Ruben, Zhao Tian, Nardi Luigi, Pedram Ardavan, Kozyrakis Christos, and Olukotun Kunle. 2018. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA) (PLDI’18). ACM, New York, NY, 296311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kwon Hyoukjun, Samajdar Ananda, and Krishna Tushar. 2018. MAERI: Enabling flexible dataflow mapping over DNN accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53, 2 (March2018), 461475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Lai Yi-Hsiang, Chi Yuze, Hu Yuwei, Wang Jie, Yu Cody Hao, Zhou Yuan, Cong Jason, and Zhang Zhiru. 2019. HeteroCL: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’19) (Seaside, CA). 242251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Li Jiajie, Chi Yuze, and Cong Jason. 2020. HeteroHalide: From image processing DSL to efficient FPGA acceleration. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA) (FPGA’20). ACM, New York, NY, 5157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Liu Qiaoyi, Huff Dillon, Setter Jeff, Strange Maxwell, Feng Kathleen, Sreedhar Kavya, Wang Ziheng, Zhang Keyi, Horowitz Mark, Raina Priyanka, et al. 2021. Compiling halide programs to push-memory accelerators. arXiv preprint arXiv:2105.12858 (2021).Google ScholarGoogle Scholar
  32. [32] Masud Muhammad. 2000. FPGA Routing Structures: A Novel Switch Block and Depopulated Interconnect Matrix Architectures. Ph. D. Dissertation. University of British Columbia. https://people.ece.ubc.ca/stevew/papers/pdf/imran_masc.pdf.Google ScholarGoogle Scholar
  33. Inc. Maxeler [n. d.]. MaxCompiler. Retrieved July 13, 2022 from https://www.maxeler.com/products/software/maxcompiler.Google ScholarGoogle Scholar
  34. [34] Meeus Wim, Beeck Kristof Van, Goedemé Toon, Meel Jan, and Stroobandt Dirk. 2012. An overview of today’s high-level synthesis tools. Design Automation for Embedded Systems 16, 3 (2012), 3151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Mei Baisha, Berekovic Mladen, and Mignolet J.-Y.. 2007. ADRES & DRESC: Architecture and compiler for coarse-grain reconfigurable processors. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  36. Inc. Mentor Graphics [n. d.]. Catapult High Level Synthesis. Retrieved July 13, 2022 from https://www.mentor.com/hls-lp/catapult-high-level-synthesis.Google ScholarGoogle Scholar
  37. [37] Moreau Thierry, Chen Tianqi, and Ceze Luis. 2018. Leveraging the VTA-TVM hardware-software stack for FPGA acceleration of 8-bit ResNet-18 inference. In Proceedings of the Reproducible Quality-Efficient Systems Tournament on Co-Designing Pareto-Efficient Deep Learning (ReQuEST) (Williamsburg, VA). Article 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Moreau Thierry, Chen Tianqi, Jiang Ziheng, Ceze Luis, Guestrin Carlos, and Krishnamurthy Arvind. 2018. VTA: An open hardware-software stack for deep learning. arXiv preprint arXiv:1807.04188 (2018).Google ScholarGoogle Scholar
  39. [39] Mullapudi Ravi, Adams Andrew, Sharlet Dillon, Ragan-Kelley Jonathan, and Fatahalian Kayvon. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics 35 (72016), 111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Nayak Ankita, Zhang Keyi, Setaluri Raj, Carsello Alex, Mann Makai, Richardson Stephen, Bahr Rick, Hanrahan Pat, Horowitz Mark, and Raina Priyanka. 2020. A framework for adding low-overhead, fine-grained power domains to CGRAs. In 2020 Design Automation Test in Europe Conference Exhibition (DATE’20). 846851. Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] O’Donnell John. 1988. Hydra: Hardware description in a functional language using recursion equations and high order combining forms. The Fusion of Hardware Design and Verification (1988), 309328.Google ScholarGoogle Scholar
  42. [42] Prabhakar Raghu, Zhang Yaqi, Koeplinger David, Feldman Matt, Zhao Tian, Hadjis Stefan, Pedram Ardavan, Kozyrakis Christos, and Olukotun Kunle. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA’17). 389402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Pu Jing, Bell Steven, Yang Xuan, Setter Jeff, Richardson Stephen, Ragan-Kelley Jonathan, and Horowitz Mark. 2017. Programming heterogeneous systems from an image processing DSL. ACMTransactions on Architecture and Code Optimization 14, 3, Article 26 (Aug.2017), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Ragan-Kelley Jonathan, Barnes Connelly, Adams Andrew, Paris Sylvain, Durand Frédo, and Amarasinghe Saman. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, WA) (PLDI’13). ACM, New York, NY, 519530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Reiche Oliver, Schmid Moritz, Hannig Frank, Membarth Richard, and Teich Jürgen. 2014. Code generation from a domain-specific language for C-based HLS of hardware accelerators. In 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’14). 110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Sharma Hardik, Park Jongse, Mahajan Divya, Amaro Emmanuel, Kim Joon Kyung, Shao Chenkai, Mishra Asit, and Esmaeilzadeh Hadi. 2016. From high-level deep neural models to FPGAs. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 112. Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Swartz Jordan S., Betz Vaughn, and Rose Jonathan. 1998. A fast routability-driven router for FPGAs. In Proceedings of the ACM/SIGDA 6th International Symposium on Field Programmable Gate Arrays (Monterey, CA) (FPGA’98). ACM, New York, NY, 140149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Torng Christopher, Pan Peitian, Ou Yanghui, Tan Cheng, and Batten Christopher. 2021. Ultra-elastic CGRAs for irregular loop specialization. In IEEE International Symposium on High-Performance Computer Architecture (HPCA’21). 412425. Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Truong Lenny and Hanrahan Pat. 2019. A golden age of hardware description languages: Applying programming language techniques to improve design productivity. In 3rd Summit on Advances in Programming Languages (SNAPL’19)(Leibniz International Proceedings in Informatics (LIPIcs), Vol. 136), Lerner Benjamin S., Bodík Rastislav, and Krishnamurthi Shriram (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 7:1–7:21. Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Truong Lenny, Herbst Steven, Setaluri Rajsekhar, Mann Makai, Daly Ross, Zhang Keyi, Donovick Caleb, Stanley Daniel, Horowitz Mark, Barrett Clark, and Hanrahan Pat. 2020. fault: A python embedded domain-specific language for metaprogramming portable hardware verification components. In Computer Aided Verification. Springer International Publishing, 403414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Tsiskaridze Nestan, Strange Maxwell, Mann Makai, Sreedhar Kavya, Liu Qiaoyi, Horowitz Mark, and Barrett Clark. 2021. Automating system configuration. In 2021 Formal Methods in Computer Aided Design (FMCAD). Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Laarhoven Peter J. M. Van and Aarts Emile H. L.. 1987. Simulated annealing. In Simulated Annealing: Theory and Applications. Springer, 715.Google ScholarGoogle Scholar
  53. [53] Vasilyev Artem, Bhagdikar Nikhil, Pedram Ardavan, Richardson Stephen, Kvatinsky Shahar, and Horowitz Mark. 2016. Evaluating programmable architectures for imaging and vision applications. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 113. Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Venkatesan Rangharajan, Shao Yakun Sophia, Wang Miaorong, Clemons Jason, Dai Steve, Fojtik Matthew, Keller Ben, Klinefelter Alicia, Pinckney Nathaniel, Raina Priyanka, Zhang Yanqing, Zimmer Brian, Dally William J., Emer Joel, Keckler Stephen W., and Khailany Brucek. 2019. MAGNet: A modular accelerator generator for neural networks. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). 18. Google ScholarGoogle ScholarCross RefCross Ref
  55. Veripool. [n. d.]. Verilator. Retrieved July 13, 2022 from https://www.veripool.org/verilator/.Google ScholarGoogle Scholar
  56. [56] Wang Renda, Guo Longjiang, Ai Chunyu, Li Jinbao, Ren Meirui, and Li Keqin. 2013. An efficient graph isomorphism algorithm based on canonical labeling and its parallel implementation on GPU. In IEEE 10th International Conference on High Performance Computing and Communications IEEE International Conference on Embedded and Ubiquitous Computing. 10891096. Google ScholarGoogle ScholarCross RefCross Ref
  57. Inc. Xilinx [n. d.]. Vivado High Level Synthesis. Retrieved July 13, 2022 from https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.Google ScholarGoogle Scholar
  58. [58] Xu Pengfei, Zhang Xiaofan, Hao Cong, Zhao Yang, Zhang Yongan, Wang Yue, Li Chaojian, Guan Zetong, Chen Deming, and Lin Yingyan. 2020. AutoDNNchip: An automated DNN chip predictor and builder for both FPGAs and ASICs. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA) (FPGA’20). ACM, New York, NY, 4050. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao Bingjun, and Cong Jason. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CA) (FPGA’15). ACM, New York, NY, 161170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Zhang Xiaofan, Wang Junsong, Zhu Chao, Lin Yonghua, Xiong Jinjun, Hwu Wen-mei, and Chen Deming. 2018. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the International Conference on Computer-Aided Design (ICCAD’18) (San Diego, CA). Article 56, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Zuo Wei, Liang Yun, Li Peng, Rupnow Kyle, Chen Deming, and Cong Jason. 2013. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, CA) (FPGA’13). ACM, New York, NY, 918. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Embedded Computing Systems
            ACM Transactions on Embedded Computing Systems  Volume 22, Issue 2
            March 2023
            560 pages
            ISSN:1539-9087
            EISSN:1558-3465
            DOI:10.1145/3572826
            • Editor:
            • Tulika Mitra
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 January 2023
            • Online AM: 7 July 2022
            • Accepted: 30 April 2022
            • Revised: 12 March 2022
            • Received: 18 October 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!