Skip to main navigation Skip to search Skip to main content

Rescoring teacher outputs with decoded utterances for knowledge distillation in automatic speech recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatic Speech Recognition requires large amounts of training data to achieve good results. As this hand-labeling is both slow and expensive, the utilization of untranscribed speech has been explored for the task. This generally involves training a teacher-model on the available transcribed speech, and training a student model either directly on the latent representations of the teacher, or on the decoded output. We propose to combine these two approaches, and rescore the predictions of the teacher based on the decoded output. The probability of a decoded sentence, and how it corresponds with the probability distribution output of the teacher, affects the rescoring. When training our student model and evaluating using our proposed method, we find it gives up to 8.6% relative improvement in character error rate, and 5.4% relative improvement in word error rate over our strongest baseline.

Original languageEnglish
Title of host publication2020 Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728197326
DOIs
StatePublished - 5 Dec 2020
EventJoint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2020 - Virtual, Tokyo, Japan
Duration: 5 Dec 20208 Dec 2020

Publication series

Name2020 Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2020

Conference

ConferenceJoint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2020
Country/TerritoryJapan
CityVirtual, Tokyo
Period5/12/208/12/20

Fingerprint

Dive into the research topics of 'Rescoring teacher outputs with decoded utterances for knowledge distillation in automatic speech recognition'. Together they form a unique fingerprint.

Cite this