Facto-CNN: Memory-Efficient CNN Training with Low-rank Tensor Factorization and Lossy Tensor Compression

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Convolutional neural networks (CNNs) are becoming deeper and wider to achieve higher accuracy and lower loss, significantly expanding the computational resources. Especially, training CNN models extensively consumes memory mainly due to storing intermediate feature maps generated in the forward-propagation for calculating the gradient in the back-propagation. The memory usage of the CNN model training escalates with the increase in batch size and the complexity of the model. Therefore, a lightweight training method is essential, especially when the computational resources are limited. In this paper, we propose a CNN training mechanism called Facto-CNN, leveraging low-rank tensor factorization and lossy tensor compression to reduce the memory usage required in training the CNN models. Facto-CNN factorizes the weight tensors of convolutional and fully-connected layers and then only updates one of the factorized tensors for each layer, dramatically reducing the feature map size stored in the memory. To further reduce memory consumption, Facto-CNN compresses the feature maps with a simple lossy compression technique that exploits the value similarity in the feature maps. Our experimental evaluation demonstrates that Facto-CNN reduces the memory usage for storing the feature maps by 68-93% with a trivial accuracy degradation when training the CNN models.

Original languageEnglish
Pages (from-to)662-677
Number of pages16
JournalProceedings of Machine Learning Research
Volume222
StatePublished - 2023
Event15th Asian Conference on Machine Learning, ACML 2023 - Istanbul, Turkey
Duration: 11 Nov 202314 Nov 2023

Keywords

  • CNN
  • Compression
  • Tensor Factorization
  • Training Optimization

Fingerprint

Dive into the research topics of 'Facto-CNN: Memory-Efficient CNN Training with Low-rank Tensor Factorization and Lossy Tensor Compression'. Together they form a unique fingerprint.

Cite this