TY - JOUR
T1 - Monolithically Integrated RRAM- And CMOS-Based In-Memory Computing Optimizations for Efficient Deep Learning
AU - Yin, Shihui
AU - Seo, Jae Sun
AU - Kim, Yulhwa
AU - Han, Xu
AU - Barnaby, Hugh
AU - Yu, Shimeng
AU - Luo, Yandong
AU - He, Wangxin
AU - Sun, Xiaoyu
AU - Kim, Jae Joon
N1 - Publisher Copyright:
© 1981-2012 IEEE.
PY - 2019/11/1
Y1 - 2019/11/1
N2 - Resistive RAM (RRAM) has been presented as a promising memory technology toward deep neural network (DNN) hardware design, with nonvolatility, high density, high ON/OFF ratio, and compatibility with logic process. However, prior RRAM works for DNNs have shown limitations on parallelism for in-memory computing, array efficiency with large peripheral circuits, multilevel analog operation, and demonstration of monolithic integration. In this article, we propose circuit-/device-level optimizations to improve the energy and density of RRAM-based in-memory computing architectures. We report experimental results based on prototype chip design of 128 × 64 RRAM arrays and CMOS peripheral circuits, where RRAM devices are monolithically integrated in a commercial 90-nm CMOS technology. We demonstrate the CMOS peripheral circuit optimization using input-splitting scheme and investigate the implication of higher low resistance state on energy efficiency and robustness. Employing the proposed techniques, we demonstrate RRAM-based in-memory computing with up to 116.0 TOPS/W energy efficiency and 84.2% CIFAR-10 accuracy. Furthermore, we investigate four-level programming with single RRAM device, and report the system-level performance and DNN accuracy results using circuit-level benchmark simulator NeuroSim.
AB - Resistive RAM (RRAM) has been presented as a promising memory technology toward deep neural network (DNN) hardware design, with nonvolatility, high density, high ON/OFF ratio, and compatibility with logic process. However, prior RRAM works for DNNs have shown limitations on parallelism for in-memory computing, array efficiency with large peripheral circuits, multilevel analog operation, and demonstration of monolithic integration. In this article, we propose circuit-/device-level optimizations to improve the energy and density of RRAM-based in-memory computing architectures. We report experimental results based on prototype chip design of 128 × 64 RRAM arrays and CMOS peripheral circuits, where RRAM devices are monolithically integrated in a commercial 90-nm CMOS technology. We demonstrate the CMOS peripheral circuit optimization using input-splitting scheme and investigate the implication of higher low resistance state on energy efficiency and robustness. Employing the proposed techniques, we demonstrate RRAM-based in-memory computing with up to 116.0 TOPS/W energy efficiency and 84.2% CIFAR-10 accuracy. Furthermore, we investigate four-level programming with single RRAM device, and report the system-level performance and DNN accuracy results using circuit-level benchmark simulator NeuroSim.
UR - https://www.scopus.com/pages/publications/85075013149
U2 - 10.1109/MM.2019.2943047
DO - 10.1109/MM.2019.2943047
M3 - Article
AN - SCOPUS:85075013149
SN - 0272-1732
VL - 39
SP - 54
EP - 63
JO - IEEE Micro
JF - IEEE Micro
IS - 6
M1 - 8894568
ER -