S-ViT: Sparse Vision Transformer for Accurate Face Recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most of the existing face recognition applications using deep learning models have leveraged CNN-based architectures as the feature extractor. However, recent studies have shown that in computer vision tasks, vision transformer-based models often outperform CNN-based models. Therefore, in this work, we propose a Sparse Vision Transformer (S-ViT) based on the Vision Transformer (ViT) architecture to improve the face recognition tasks. After the model is trained, S-ViT tends to have a sparse distribution of weights compared to ViT, so we named it according to these characteristics. Unlike the conventional ViT, our proposed S-ViT adopts image Relative Positional Encoding (iRPE) method for positional encoding. Also, S-ViT has been modified so that all token embeddings, not just class token, participate in the decoding process. Through extensive experiment, we showed that S-ViT achieves better performance in closed-set than the other baseline models, and showed better performance than the baseline ViT-based models. For example, when using ArcFace as the loss function in the identification protocol, S-ViT achieved up to 3.27% higher accuracy than ResNet50. We also show that the use of ArcFace loss functions yields greater performance gains in S-ViT than in baseline models. In addition, S-ViT has an advantage in cost-performance trade-off because it tends to be more robust to the pruning technique than the underlying model, ViT. Therefore, S-ViT offers the additional advantage, which can be applied more flexibly in the target devices with limited resources.

Original languageEnglish
Title of host publicationProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, SAC 2023
PublisherAssociation for Computing Machinery
Pages1130-1138
Number of pages9
ISBN (Electronic)9781450395175
DOIs
StatePublished - 27 Mar 2023
Event38th Annual ACM Symposium on Applied Computing, SAC 2023 - Tallinn, Estonia
Duration: 27 Mar 202331 Mar 2023

Publication series

NameProceedings of the ACM Symposium on Applied Computing

Conference

Conference38th Annual ACM Symposium on Applied Computing, SAC 2023
Country/TerritoryEstonia
CityTallinn
Period27/03/2331/03/23

Keywords

  • deep learning model compression
  • face recognition
  • neural networks
  • pruning
  • vision transformer

Fingerprint

Dive into the research topics of 'S-ViT: Sparse Vision Transformer for Accurate Face Recognition'. Together they form a unique fingerprint.

Cite this