TY - GEN
T1 - Accelerating Deep Neural Networks Using FPGAs and ZYNQ
AU - Lee, Han Sung
AU - Wook Jeon, Jae
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/8/23
Y1 - 2021/8/23
N2 - This article aims at implementing a Deep Neural Network (DNN) using Field Programmable Gate Arrays (FPGAs) for real time deep learning inference in embedded systems. In now days DNNs are widely used where high accuracy is required. However, due to the structural complexity, deep learning models are highly computationally intensive. To improve the system performance, optimization techniques such as weight quantization and pruning are commonly adopted. Another approach to improve the system performance is by applying heterogeneous architectures. Processor with Graphics Processing Unit (GPU) architectures are commonly used for deep learning training and inference acceleration. However, GPUs are expensive and consume much power that not a perfect solution for embedded systems. In this paper, we implemented a deep neural network on a Zynq SoC which is a heterogenous system integrated of ARM processor and FPGA. We trained the model with MNIST database, quantized the model's 32-bit floating point weights and bias into integer and implemented model to inference in FPGA. As a result, we deployed a network on an embedded system while maintaining inference accuracy and accelerated the system performance with using less resources.
AB - This article aims at implementing a Deep Neural Network (DNN) using Field Programmable Gate Arrays (FPGAs) for real time deep learning inference in embedded systems. In now days DNNs are widely used where high accuracy is required. However, due to the structural complexity, deep learning models are highly computationally intensive. To improve the system performance, optimization techniques such as weight quantization and pruning are commonly adopted. Another approach to improve the system performance is by applying heterogeneous architectures. Processor with Graphics Processing Unit (GPU) architectures are commonly used for deep learning training and inference acceleration. However, GPUs are expensive and consume much power that not a perfect solution for embedded systems. In this paper, we implemented a deep neural network on a Zynq SoC which is a heterogenous system integrated of ARM processor and FPGA. We trained the model with MNIST database, quantized the model's 32-bit floating point weights and bias into integer and implemented model to inference in FPGA. As a result, we deployed a network on an embedded system while maintaining inference accuracy and accelerated the system performance with using less resources.
KW - AI
KW - Deep Learning
KW - Deep Neural Networks
KW - FPGA
KW - Quantization
KW - ZYNQ
UR - https://www.scopus.com/pages/publications/85117503621
U2 - 10.1109/TENSYMP52854.2021.9550853
DO - 10.1109/TENSYMP52854.2021.9550853
M3 - Conference contribution
AN - SCOPUS:85117503621
T3 - TENSYMP 2021 - 2021 IEEE Region 10 Symposium
BT - TENSYMP 2021 - 2021 IEEE Region 10 Symposium
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Region 10 Symposium, TENSYMP 2021
Y2 - 23 August 2021 through 25 August 2021
ER -