Abstract
To effectively minimize static power for a wide range of applications, power domains for coarse-grained reconfigurable array (CGRA) architectures need to be more fine-grained than those found in a typical application-specific integrated circuit. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that reduces the area overhead of power domain boundary protection from around 9% to less than 1% without incurring any extra timing delay from the isolation cells. Conventional Unified Power Format based flow for power domain boundary protection does not support this design choice. Therefore, we create our own compiler-like passes that iteratively introduce the needed design changes, and formally verify the transformations using methods based on satisfiability modulo theories. These passes also let us optimize how we handle test and debug signals through the off tiles in the CGRA. Using our framework, we add power domains to a CGRA that we designed and taped out. The CGRA has 32 × 16 processing element and memory tiles and 4-MB secondary memory. We address the implementation challenges encountered due to the introduction of fine-grained power domains, including the addressing of the CGRA tiles, the power grid design, well substrate connections, and distribution of global signals. Our CGRA achieves up to 83% reduction in leakage power and 26% reduction in total power versus an identical CGRA without multiple power domains, for a range of image processing and machine learning applications.
- [1] . Halide. Retrieved September 6, 2022 from https://github.com/halide/halide.github.com.Google Scholar
- [2] . 2017. Stratix V Device Handbook. Retrieved September 6, 2022 from https://media.digikey.com/pdf/Data%20Sheets/Altera%20PDFs/Stratix_V_Handbook.pdf.Google Scholar
- [3] . 2017. A multithreaded CGRA for convolutional neural network processing. Circuits and Systems 8, 6 (2017), 149–170.Google ScholarCross Ref
- [4] . 2012. Field-Programmable Gate Arrays. Vol. 180. Springer Science & Business Media, New York, NY.Google Scholar
- [5] . 2010. An FPGA architecture supporting dynamically controlled power gating. In Proceedings of the 2010 International Conference on Field-Programmable Technology. IEEE, Los Alamitos, CA, 1–8.Google ScholarCross Ref
- [6] . 2019. Conformal Low Power. Retrieved September 6, 2022 from https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/tools/digital-design-signoff/encounter-conformal-low-power-ds.pdf.Google Scholar
- [7] . 2007. Designing a coarse-grained reconfigurable architecture for power efficiency. In Proceedings of the Department of Energy NA-22 University Information Technical Interchange Review Meeting, Vol. 20. U.S. Department of Energy.Google Scholar
- [8] . 2014. Efficient and effective packing and analytical placement for large-scale heterogeneous FPGAs. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). IEEE, Los Alamitos, CA, 647–654.Google ScholarDigital Library
- [9] . 2019. R-accelerator: An RRAM-based CGRA accelerator with logic contraction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27, 11 (2019), 2655–2667.Google ScholarDigital Library
- [10] . 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the CVPR 2011 Workshops. IEEE, Los Alamitos, CA, 109–116.Google ScholarCross Ref
- [11] . 2007. Low Power Methodology Manual: For System-on-Chip Design. Springer Science & Business Media, New York, NY.Google Scholar
- [12] . 2004. Reducing leakage energy in FPGAs using region-constrained placement. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 51–58.Google ScholarDigital Library
- [13] . 2021. Snafu: An ultra-low-power, energy-minimal CGRA-generation framework and architecture. In Proceedings of the 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA’21). IEEE, Los Alamitos, CA, 1027–1040.Google ScholarDigital Library
- [14] . 2013. Low power design flow based on Unified Power Format and Synopsys tool chain. In Proceedings of the 2013 3rd Interdisciplinary Engineering Design Education Conference. IEEE, Los Alamitos, CA, 28–31.Google ScholarCross Ref
- [15] . 2010. Tabulas Time Machine—Rapidly Reconfigurable Chips Will Challenge Conventional FPGAs. Microprocessor Report. Tabula.Google Scholar
- [16] . 2012. State-based full predication for low power coarse-grained reconfigurable architecture. In Proceedings of the 2012 Design, Automation, and Test in Europe Conference and Exhibition (DATE’12). IEEE, Los Alamitos, CA, 1367–1372.Google Scholar
- [17] . Magma. Retrieved September 6, 2022 from https://github.com/phanrahan/magma.Google Scholar
- [18] . 2010. A low-power FPGA based on autonomous fine-grain power gating. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19, 8 (2010), 1394–1406.Google ScholarDigital Library
- [19] . 2013. Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells. In Proceedings of the International Symposium on Quality Electronic Design (ISQED’13). IEEE, Los Alamitos, CA, 104–111.
DOI: Google ScholarCross Ref - [20] . 2014. NeuroCGRA: A CGRA with support for neural networks. In Proceedings of the 2014 International Conference on High Performance Computing and Simulation (HPCS’14). IEEE, Los Alamitos, CA, 506–511.Google ScholarCross Ref
- [21] . 2005. APlace: A general analytic placement framework. In Proceedings of the 2005 International Symposium on Physical Design. ACM, New York, NY, 233–235.Google ScholarDigital Library
- [22] . 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Annual Design Automation Conference.ACM, New York, NY, 1–6.Google ScholarDigital Library
- [23] . 2014. ULP-SRP: Ultra low-power Samsung reconfigurable processor for biomedical applications. ACM Transactions on Reconfigurable Technology and Systems 7, 3 (2014), 1–15.Google ScholarDigital Library
- [24] . 2020. GenMap: A genetic algorithmic approach for optimizing spatial mapping of coarse-grained reconfigurable architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 11 (2020), 2383–2396.Google ScholarCross Ref
- [25] . 2020. MCEA: A resource-aware multicore CGRA architecture for the edge. In Proceedings of the 2020 30th International Conference on Field-Programmable Logic and Applications (FPL’20). IEEE, Los Alamitos, CA, 33–39.Google ScholarCross Ref
- [26] . 2015. Spatial and temporal granularity limits of body biasing in UTBB-FDSOI. In Proceedings of the 2015 Design, Automation, and Test in Europe Conference and Exhibition (DATE’15). IEEE, Los Alamitos, CA, 876–879.Google ScholarCross Ref
- [27] . 2020. Low power AI ASIC design for portable edge computing. In Proceedings of the 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology (ICSICT’20). IEEE, Los Alamitos, CA, 1–4.Google ScholarCross Ref
- [28] . 2011. New power-efficient FPGA design combining with region-constrained placement and multiple power domains. In Proceedings of the 2011 IEEE 9th International New Circuits and Systems Conference. IEEE, Los Alamitos, CA, 69–72.Google ScholarCross Ref
- [29] . 2001. Maximum current estimation considering power gating. In Proceedings of the 2001 International Symposium on Physical Design. ACM, New York, NY, 106–111.Google ScholarDigital Library
- [30] . 2018. A 34-FPS 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing. IEEE Transactions on Industrial Electronics 66, 9 (2018), 7407–7416.Google ScholarCross Ref
- [31] . 2017. Evaluation of CGRA architecture for real-time processing of biological signals on wearable devices. In Proceedings of the 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig’17). IEEE, Los Alamitos, CA, 1–7.Google ScholarCross Ref
- [32] . 2009. Power reduction techniques and flows at RTL and system level. In Proceedings of the 2009 22nd International Conference on VLSI Design. IEEE, New York, NY, 28–29.
DOI: Google ScholarDigital Library - [33] . 2018. CoSA: Integrated verification for agile hardware design. In Proceedings of the 2018 Conference on Formal Methods in Computer Aided Design (FMCAD’18). IEEE, Los Alamitos, CA, 1–5.Google ScholarCross Ref
- [34] . 2012. Xilinx 7 Series FPGAs: The Logical Advantage. Retrieved September 6, 2022 from https://www.techonline.com/tech-papers/xilinx-7-series-fpgas-the-logical-advantage/.Google Scholar
- [35] . 2016. Intra mode power saving methodology for CGRA-based reconfigurable processor architectures. In Proceedings of the 2016 IEEE International Symposium on Circuits and Systems (ISCAS’16). IEEE, Los Alamitos, CA, 714–717.Google ScholarDigital Library
- [36] . 2015. Low power methodology for an ASIC design flow based on high-level synthesis. In Proceedings of the 2015 23rd International Conference on Software, Telecommunications, and Computer Networks (SoftCOM’15). IEEE, Los Alamitos, CA, 11–15.
DOI: Google ScholarCross Ref - [37] . 2017. A Coarse Grain Reconfigurable Array (CGRA) for Statically Scheduled Data Flow Computing. White Paper. Wave Computing.Google Scholar
- [38] . 2011. Cool mega-arrays: Ultralow-power reconfigurable accelerator chips. IEEE Micro 31, 6 (2011), 6–18.Google ScholarDigital Library
- [39] . Boolean Satisfiability Problem. Retrieved September 6, 2022 from https://en.wikipedia.org/wiki/Boolean_satisfiability_problem.Google Scholar
- [40] . n.d. Gemstone. Retrieved September 6, 2022 from https://github.com/StanfordAHA/gemstone.Google Scholar
- [41] . 2015. UPF-based formal verification of low power techniques in modern processors. In Proceedings of the 2015 IEEE 33rd VLSI Test Symposium (VTS’15). IEEE, Los Alamitos, CA, 1–6.Google ScholarCross Ref
- [42] . 1995. Assessing and improving current practice in the design of application-specific signal processors. In Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing, Vol. 4. 2707–2710.
DOI: Google ScholarCross Ref - [43] . 2010. Power gating: Circuits, design methodologies, and best practice for standard-cell VLSI designs. ACM Transactions on Design Automation of Electronic Systems 15, 4 (Oct. 2010), Article 28, 37 pages.
DOI: Google ScholarDigital Library - [44] . Satisfiability Modulo Theories. Retrieved September 6, 2022 from https://en.wikipedia.org/wiki/Satisfiability_modulo_theories.Google Scholar
- [45] . 1998. A fast routability-driven router for FPGAs. In Proceedings of the 1998 ACM/SIGDA 6th International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 140–149.Google ScholarDigital Library
- [46] . 2015. A CGRA-based approach for accelerating convolutional neural networks. In Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip. IEEE, Los Alamitos, CA, 73–80.Google ScholarDigital Library
- [47] . 2018. Power Intent Standard. https://www.p1801.org.Google Scholar
- [48] . 1987. Simulated annealing. In Simulated Annealing: Theory and Applications. Springer, New York, NY, 7–15.Google Scholar
- [49] . 2016. Evaluating programmable architectures for imaging and vision applications. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, Los Alamitos, CA, 1–13.Google ScholarCross Ref
- [50] . 2020. Cost-effective hardware accelerator recommendation for edge computing. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Edge Computing (HotEdge’20).Google Scholar
Index Terms
Improving Energy Efficiency of CGRAs with Low-Overhead Fine-Grained Power Domains
Comments