research-article

Improving Energy Efficiency of CGRAs with Low-Overhead Fine-Grained Power Domains

Published:02 April 2023Publication History
Skip Abstract Section

Abstract

To effectively minimize static power for a wide range of applications, power domains for coarse-grained reconfigurable array (CGRA) architectures need to be more fine-grained than those found in a typical application-specific integrated circuit. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that reduces the area overhead of power domain boundary protection from around 9% to less than 1% without incurring any extra timing delay from the isolation cells. Conventional Unified Power Format based flow for power domain boundary protection does not support this design choice. Therefore, we create our own compiler-like passes that iteratively introduce the needed design changes, and formally verify the transformations using methods based on satisfiability modulo theories. These passes also let us optimize how we handle test and debug signals through the off tiles in the CGRA. Using our framework, we add power domains to a CGRA that we designed and taped out. The CGRA has 32 × 16 processing element and memory tiles and 4-MB secondary memory. We address the implementation challenges encountered due to the introduction of fine-grained power domains, including the addressing of the CGRA tiles, the power grid design, well substrate connections, and distribution of global signals. Our CGRA achieves up to 83% reduction in leakage power and 26% reduction in total power versus an identical CGRA without multiple power domains, for a range of image processing and machine learning applications.

REFERENCES

  1. [1] Adams Andrew. n.d. Halide. Retrieved September 6, 2022 from https://github.com/halide/halide.github.com.Google ScholarGoogle Scholar
  2. [2] Altera. 2017. Stratix V Device Handbook. Retrieved September 6, 2022 from https://media.digikey.com/pdf/Data%20Sheets/Altera%20PDFs/Stratix_V_Handbook.pdf.Google ScholarGoogle Scholar
  3. [3] Ando Kota, Takamaeda-Yamazaki Shinya, Ikebe Masayuki, Asai Tetsuya, and Motomura Masato. 2017. A multithreaded CGRA for convolutional neural network processing. Circuits and Systems 8, 6 (2017), 149170.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Brown Stephen D., Francis Robert J., Rose Jonathan, and Vranesic Zvonko G.. 2012. Field-Programmable Gate Arrays. Vol. 180. Springer Science & Business Media, New York, NY.Google ScholarGoogle Scholar
  5. [5] Bsoul Assem A. M. and Wilton Steven J. E.. 2010. An FPGA architecture supporting dynamically controlled power gating. In Proceedings of the 2010 International Conference on Field-Programmable Technology. IEEE, Los Alamitos, CA, 18.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Cadence. 2019. Conformal Low Power. Retrieved September 6, 2022 from https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/tools/digital-design-signoff/encounter-conformal-low-power-ds.pdf.Google ScholarGoogle Scholar
  7. [7] Carroll Allan, Friedman Stephen, Essen Brian Van, Wood Aaron, Ylvisaker Benjamin, Ebeling Carl, and Hauck Scott. 2007. Designing a coarse-grained reconfigurable architecture for power efficiency. In Proceedings of the Department of Energy NA-22 University Information Technical Interchange Review Meeting, Vol. 20. U.S. Department of Energy.Google ScholarGoogle Scholar
  8. [8] Chen Yu-Chen, Chen Sheng-Yen, and Chang Yao-Wen. 2014. Efficient and effective packing and analytical placement for large-scale heterogeneous FPGAs. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). IEEE, Los Alamitos, CA, 647654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chen Zhengyu, Zhou Hai, and Gu Jie. 2019. R-accelerator: An RRAM-based CGRA accelerator with logic contraction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27, 11 (2019), 26552667.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Farabet Clément, Martini Berin, Corda Benoit, Akselrod Polina, Culurciello Eugenio, and LeCun Yann. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the CVPR 2011 Workshops. IEEE, Los Alamitos, CA, 109116.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Flynn David, Aitken Rob, Gibbons Alan, and Shi Kaijian. 2007. Low Power Methodology Manual: For System-on-Chip Design. Springer Science & Business Media, New York, NY.Google ScholarGoogle Scholar
  12. [12] Gayasen Aman, Tsai Y., Vijaykrishnan Narayanan, Kandemir Mahmut, Irwin Mary Jane, and Tuan Tim. 2004. Reducing leakage energy in FPGAs using region-constrained placement. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 5158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Gobieski Graham, Atli Ahmet Oguz, Mai Kenneth, Lucia Brandon, and Beckmann Nathan. 2021. Snafu: An ultra-low-power, energy-minimal CGRA-generation framework and architecture. In Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA’21). IEEE, Los Alamitos, CA, 10271040.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Gourisetty Venkatesh, Mahmoodi Hamid, Melikyan Vazgen, Babayan Eduard, Goldman Rich, Holcomb Katie, and Wood Troy. 2013. Low power design flow based on Unified Power Format and Synopsys tool chain. In Proceedings of the 2013 3rd Interdisciplinary Engineering Design Education Conference. IEEE, Los Alamitos, CA, 2831.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Halfhill Tom R.. 2010. Tabulas Time Machine—Rapidly Reconfigurable Chips Will Challenge Conventional FPGAs. Microprocessor Report. Tabula.Google ScholarGoogle Scholar
  16. [16] Han Kyuseung, Park Seongsik, and Choi Kiyoung. 2012. State-based full predication for low power coarse-grained reconfigurable architecture. In Proceedings of the 2012 Design, Automation, and Test in Europe Conference and Exhibition (DATE’12). IEEE, Los Alamitos, CA, 13671372.Google ScholarGoogle Scholar
  17. [17] Hanrahan Pat. n.d. Magma. Retrieved September 6, 2022 from https://github.com/phanrahan/magma.Google ScholarGoogle Scholar
  18. [18] Ishihara Shota, Hariyama Masanori, and Kameyama Michitaka. 2010. A low-power FPGA based on autonomous fine-grain power gating. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19, 8 (2010), 13941406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Jafri Syed M. A. H., Bag Ozan, Hemani Ahmed, Farahini Nasim, Paul Kolin, Plosila Juha, and Tenhunen Hannu. 2013. Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells. In Proceedings of the International Symposium on Quality Electronic Design (ISQED’13). IEEE, Los Alamitos, CA, 104111. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Jafri Syed M. A. H., Gia Tuan Nguyen, Dytckov Sergei, Daneshtalab Masoud, Hemani Ahmed, Plosila Juha, and Tenhunen Hannu. 2014. NeuroCGRA: A CGRA with support for neural networks. In Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS’14). IEEE, Los Alamitos, CA, 506511.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Kahng Andrew B., Reda Sherief, and Wang Qinke. 2005. APlace: A general analytic placement framework. In Proceedings of the 2005 International Symposium on Physical Design. ACM, New York, NY, 233235.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Karunaratne Manupa, Mohite Aditi Kulkarni, Mitra Tulika, and Peh Li-Shiuan. 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Annual Design Automation Conference.ACM, New York, NY, 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Kim Changmoo, Chung Mookyoung, Cho Yeongon, Konijnenburg Mario, Ryu Soojung, and Kim Jeongwook. 2014. ULP-SRP: Ultra low-power Samsung reconfigurable processor for biomedical applications. ACM Transactions on Reconfigurable Technology and Systems 7, 3 (2014), 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Kojima Takuya, Doan Nguyen Anh Vu, and Amano Hideharu. 2020. GenMap: A genetic algorithmic approach for optimizing spatial mapping of coarse-grained reconfigurable architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 11 (2020), 23832396.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Korol Guilherme, Jordan Michael Guilherme, Brandalero Marcelo, Hübner Michael, Rutzig Mateus Beck, and Beck Antonio Carlos Schneider. 2020. MCEA: A resource-aware multicore CGRA architecture for the edge. In Proceedings of the 2020 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, Los Alamitos, CA, 3339.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kühn Johannes Maximilian, Peterson Dustin, Amano Hideharu, Bringmann Oliver, and Rosenstiel Wolfgang. 2015. Spatial and temporal granularity limits of body biasing in UTBB-FDSOI. In Proceedings of the 2015 Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). IEEE, Los Alamitos, CA, 876879.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Lei Yuan, Luo Peng, Chan Chi Hong, Huo Xiao, Li Yiu Kei, and Ieong Mei Kei. 2020. Low power AI ASIC design for portable edge computing. In Proceedings of the 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology (ICSICT’20). IEEE, Los Alamitos, CA, 14.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Li Ce, Dong Yiping, and Watanabe Takahiro. 2011. New power-efficient FPGA design combining with region-constrained placement and multiple power domains. In Proceedings of the 2011 IEEE 9th International New Circuits and Systems Conference. IEEE, Los Alamitos, CA, 6972.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Li Fei and He Lei. 2001. Maximum current estimation considering power gating. In Proceedings of the 2001 International Symposium on Physical Design. ACM, New York, NY, 106111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Li Yixing, Liu Zichuan, Liu Wenye, Jiang Yu, Wang Yongliang, Goh Wang Ling, Yu Hao, and Ren Fengbo. 2018. A 34-FPS 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing. IEEE Transactions on Industrial Electronics 66, 9 (2018), 74077416.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Lopes João, Sousa Diogo, and Ferreira João Canas. 2017. Evaluation of CGRA architecture for real-time processing of biological signals on wearable devices. In Proceedings of the 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig’17). IEEE, Los Alamitos, CA, 17.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Mathur Anmol and Wang Qi. 2009. Power reduction techniques and flows at RTL and system level. In Proceedings of the 2009 22nd International Conference on VLSI Design. IEEE, New York, NY, 2829. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Mattarei Cristian, Mann Makai, Barrett Clark, Daly Ross G., Huff Dillon, and Hanrahan Pat. 2018. CoSA: Integrated verification for agile hardware design. In Proceedings of the 2018 Conference on Formal Methods in Computer Aided Design (FMCAD’18). IEEE, Los Alamitos, CA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Mehta Nick. 2012. Xilinx 7 Series FPGAs: The Logical Advantage. Retrieved September 6, 2022 from https://www.techonline.com/tech-papers/xilinx-7-series-fpgas-the-logical-advantage/.Google ScholarGoogle Scholar
  35. [35] Miniskar Narasinga Rao, Patil Rahul R., Gadde Raj Narayana, Cho Young-Chul Rams, Kim Sukjin, and Lee Shi Hwa. 2016. Intra mode power saving methodology for CGRA-based reconfigurable processor architectures. In Proceedings of the 2016 IEEE International Symposium on Circuits and Systems (ISCAS’16). IEEE, Los Alamitos, CA, 714717.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Muslim Fahad Bin, Qamar Affaq, and Lavagno Luciano. 2015. Low power methodology for an ASIC design flow based on high-level synthesis. In Proceedings of the 2015 23rd International Conference on Software, Telecommunications, and Computer Networks (SoftCOM’15). IEEE, Los Alamitos, CA, 1115. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Nicol Chris. 2017. A Coarse Grain Reconfigurable Array (CGRA) for Statically Scheduled Data Flow Computing. White Paper. Wave Computing.Google ScholarGoogle Scholar
  38. [38] Ozaki Nobuaki, Yasuda Yoshihiro, Izawa Mai, Saito Yoshiki, Ikebuchi Daisuke, Amano Hideharu, Nakamura Hiroshi, Usami Kimiyoshi, Namiki Mitaro, and Kondo Masaaki. 2011. Cool mega-arrays: Ultralow-power reconfigurable accelerator chips. IEEE Micro 31, 6 (2011), 618.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Wikipedia. n.d. Boolean Satisfiability Problem. Retrieved September 6, 2022 from https://en.wikipedia.org/wiki/Boolean_satisfiability_problem.Google ScholarGoogle Scholar
  40. [40] Setaluri Raj. n.d. Gemstone. Retrieved September 6, 2022 from https://github.com/StanfordAHA/gemstone.Google ScholarGoogle Scholar
  41. [41] Sharafinejad Reza, Alizadeh Bijan, and Fujita Masahiro. 2015. UPF-based formal verification of low power techniques in modern processors. In Proceedings of the 2015 IEEE 33rd VLSI Test Symposium (VTS’15). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Shaw G. A., Anderson J. C., and Madisetti V. K.. 1995. Assessing and improving current practice in the design of application-specific signal processors. In Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing, Vol. 4. 2707–2710. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Shin Youngsoo, Seomun Jun, Choi Kyu-Myung, and Sakurai Takayasu. 2010. Power gating: Circuits, design methodologies, and best practice for standard-cell VLSI designs. ACM Transactions on Design Automation of Electronic Systems 15, 4 (Oct. 2010), Article 28, 37 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wikipedia. n.d. Satisfiability Modulo Theories. Retrieved September 6, 2022 from https://en.wikipedia.org/wiki/Satisfiability_modulo_theories.Google ScholarGoogle Scholar
  45. [45] Swartz Jordan S., Betz Vaughn, and Rose Jonathan. 1998. A fast routability-driven router for FPGAs. In Proceedings of the 1998 ACM/SIGDA 6th International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 140149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Tanomoto Masakazu, Takamaeda-Yamazaki Shinya, Yao Jun, and Nakashima Yasuhiko. 2015. A CGRA-based approach for accelerating convolutional neural networks. In Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip. IEEE, Los Alamitos, CA, 7380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] UPF. 2018. Power Intent Standard. https://www.p1801.org.Google ScholarGoogle Scholar
  48. [48] Laarhoven Peter J. M. Van and Aarts Emile H. L.. 1987. Simulated annealing. In Simulated Annealing: Theory and Applications. Springer, New York, NY, 715.Google ScholarGoogle Scholar
  49. [49] Vasilyev Artem, Bhagdikar Nikhil, Pedram Ardavan, Richardson Stephen, Kvatinsky Shahar, and Horowitz Mark. 2016. Evaluating programmable architectures for imaging and vision applications. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, Los Alamitos, CA, 113.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zhou Xingyu, Canady Robert, Bao Shunxing, and Gokhale Aniruddha. 2020. Cost-effective hardware accelerator recommendation for edge computing. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Edge Computing (HotEdge’20).Google ScholarGoogle Scholar

Index Terms

  1. Improving Energy Efficiency of CGRAs with Low-Overhead Fine-Grained Power Domains

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 16, Issue 2
          June 2023
          318 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3587031
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 April 2023
          • Online AM: 27 August 2022
          • Accepted: 8 August 2022
          • Revised: 30 May 2022
          • Received: 13 December 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)202
          • Downloads (Last 6 weeks)30

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!