TY - GEN
T1 - A Weight-Sharing Autoencoder with Dynamic Quantization for Efficient Feature Compression
AU - Choi, Ji Sub
AU - Kim, Jungrae
AU - Ko, Jong Hwan
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Collaborative inference (CI) enhances the inference efficiency of deep neural networks (DNNs) by partitioning a computational workload between an edge device and a cloud platform. Efficient inference using CI requires searching for the optimal partition layer that minimizes the end-to-end inference latency. In addition, the intermediate feature at the partitioned layer should be effectively compressed. However, recent DNN-based feature compression methods require independent models dedicated for each partition point, resulting in significant storage overhead. In this paper, we propose a novel method that efficiently compresses the features from variable partition layers using a single autoencoder. The proposed method incorporates a weight-sharing technique that shares the weights of autoencoders that compress each partition layer. In addition, dynamic bitwidths quantization is supported for flexibility in compression ratio. The experimental results show that the proposed method reduced the required parameter size by 4× compared to the existing independent model based method, while maintaining the accuracy loss within 0.5%.
AB - Collaborative inference (CI) enhances the inference efficiency of deep neural networks (DNNs) by partitioning a computational workload between an edge device and a cloud platform. Efficient inference using CI requires searching for the optimal partition layer that minimizes the end-to-end inference latency. In addition, the intermediate feature at the partitioned layer should be effectively compressed. However, recent DNN-based feature compression methods require independent models dedicated for each partition point, resulting in significant storage overhead. In this paper, we propose a novel method that efficiently compresses the features from variable partition layers using a single autoencoder. The proposed method incorporates a weight-sharing technique that shares the weights of autoencoders that compress each partition layer. In addition, dynamic bitwidths quantization is supported for flexibility in compression ratio. The experimental results show that the proposed method reduced the required parameter size by 4× compared to the existing independent model based method, while maintaining the accuracy loss within 0.5%.
KW - Autoencoder
KW - Collaborative Inference
KW - Dynamic Quantization
KW - Feature Compression
UR - https://www.scopus.com/pages/publications/85122921654
U2 - 10.1109/ICTC52510.2021.9620912
DO - 10.1109/ICTC52510.2021.9620912
M3 - Conference contribution
AN - SCOPUS:85122921654
T3 - International Conference on ICT Convergence
SP - 1111
EP - 1113
BT - ICTC 2021 - 12th International Conference on ICT Convergence
PB - IEEE Computer Society
T2 - 12th International Conference on Information and Communication Technology Convergence, ICTC 2021
Y2 - 20 October 2021 through 22 October 2021
ER -