Work-in-progress: Flexible group-level pruning of deep neural networks for fast inference on mobile GPUs

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. In this paper, we propose a novel group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques can not achieve the desired accuracy at high sparsity. In this paper, we propose a unaligned approach to improve the accuracy of compressed model.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion, CASES 2019
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450369251
DOIs
StatePublished - 13 Oct 2019
Externally publishedYes
Event2019 International Conference on Compliers, Architectures and Synthesis for Embedded Systems, CASES 2019 - New York, United States
Duration: 13 Oct 201918 Oct 2019

Publication series

NameProceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion, CASES 2019

Conference

Conference2019 International Conference on Compliers, Architectures and Synthesis for Embedded Systems, CASES 2019
Country/TerritoryUnited States
CityNew York
Period13/10/1918/10/19

Fingerprint

Dive into the research topics of 'Work-in-progress: Flexible group-level pruning of deep neural networks for fast inference on mobile GPUs'. Together they form a unique fingerprint.

Cite this