Abstract
We introduce an area/energy-efficient precision-scalable neural network accelerator architecture. Previous precision-scalable hardware accelerators have limitations such as the under-utilization of multipliers for low bit-width operations and the large area overhead to support various bit precisions. To mitigate the problems, we first propose a bitwise summation, which reduces the area overhead for the bit-width scaling. In addition, we present a channel-wise aligning scheme (CAS) to efficiently fetch inputs and weights from on-chip SRAM buffers and a channel-first and pixel-last tiling (CFPL) scheme to maximize the utilization of multipliers on various kernel sizes. A test chip was implemented in 28-nm CMOS technology, and the experimental results show that the throughput and energy efficiency of our chip are up to 7.7 × and 1.64 × higher than those of the state-of-the-art designs, respectively. Moreover, additional 1.5-3.4 × throughput gains can be achieved using the CFPL method compared to the CAS.
| Original language | English |
|---|---|
| Pages (from-to) | 1924-1935 |
| Number of pages | 12 |
| Journal | IEEE Journal of Solid-State Circuits |
| Volume | 57 |
| Issue number | 6 |
| DOIs | |
| State | Published - 1 Jun 2022 |
| Externally published | Yes |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Bit-precision scaling
- bitwise summation
- channel-first and pixel-last tiling (CFPL)
- channel-wise aligning
- deep neural network
- hardware accelerator
- multiply-accumulate unit
Fingerprint
Dive into the research topics of 'BitBlade: Energy-Efficient Variable Bit-Precision Hardware Accelerator for Quantized Neural Networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver