TY - GEN
T1 - Rescoring teacher outputs with decoded utterances for knowledge distillation in automatic speech recognition
AU - Holen, Henning M.
AU - Lee, Jee Hyong
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/5
Y1 - 2020/12/5
N2 - Automatic Speech Recognition requires large amounts of training data to achieve good results. As this hand-labeling is both slow and expensive, the utilization of untranscribed speech has been explored for the task. This generally involves training a teacher-model on the available transcribed speech, and training a student model either directly on the latent representations of the teacher, or on the decoded output. We propose to combine these two approaches, and rescore the predictions of the teacher based on the decoded output. The probability of a decoded sentence, and how it corresponds with the probability distribution output of the teacher, affects the rescoring. When training our student model and evaluating using our proposed method, we find it gives up to 8.6% relative improvement in character error rate, and 5.4% relative improvement in word error rate over our strongest baseline.
AB - Automatic Speech Recognition requires large amounts of training data to achieve good results. As this hand-labeling is both slow and expensive, the utilization of untranscribed speech has been explored for the task. This generally involves training a teacher-model on the available transcribed speech, and training a student model either directly on the latent representations of the teacher, or on the decoded output. We propose to combine these two approaches, and rescore the predictions of the teacher based on the decoded output. The probability of a decoded sentence, and how it corresponds with the probability distribution output of the teacher, affects the rescoring. When training our student model and evaluating using our proposed method, we find it gives up to 8.6% relative improvement in character error rate, and 5.4% relative improvement in word error rate over our strongest baseline.
UR - https://www.scopus.com/pages/publications/85100387249
U2 - 10.1109/SCISISIS50064.2020.9322742
DO - 10.1109/SCISISIS50064.2020.9322742
M3 - Conference contribution
AN - SCOPUS:85100387249
T3 - 2020 Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2020
BT - 2020 Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2020
Y2 - 5 December 2020 through 8 December 2020
ER -