SVD-softmax: Fast softmax approximation on large vocabulary neural networks

  • Kyuhong Shim
  • , Minjae Lee
  • , Iksoo Choi
  • , Yoonho Boo
  • , Wonyong Sung

Research output: Contribution to journalConference articlepeer-review

Abstract

We propose a fast approximation method of a softmax function with a very large vocabulary using singular value decomposition (SVD). SVD-softmax targets fast and accurate probability estimation of the topmost probable words during inference of neural network language models. The proposed method transforms the weight matrix used in the calculation of the output vector by using SVD. The approximate probability of each word can be estimated with only a small part of the weight matrix by using a few large singular values and the corresponding elements for most of the words. We applied the technique to language modeling and neural machine translation and present a guideline for good approximation. The algorithm requires only approximately 20% of arithmetic operations for an 800K vocabulary case and shows more than a three-fold speedup on a GPU.

Original languageEnglish
Pages (from-to)5464-5474
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2017-December
StatePublished - 2017
Externally publishedYes
Event31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States
Duration: 4 Dec 20179 Dec 2017

Fingerprint

Dive into the research topics of 'SVD-softmax: Fast softmax approximation on large vocabulary neural networks'. Together they form a unique fingerprint.

Cite this