TY - JOUR
T1 - Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data
AU - Yang, Seunghan
AU - Kim, Byeonggeun
AU - Shim, Kyuhong
AU - Chang, Simyung
N1 - Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning has been widely adopted for learning representations from unlabeled data; however, it is known to be suitable for large models with enough capacity and is not practical for training a small footprint FS-KWS model. Instead, we automatically annotate and filter the data to construct a keywords-like dataset, LibriWord, enabling supervision on auxiliary data. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data. Our method notably improves the performance over competitive methods in the FS-KWS benchmark.
AB - Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning has been widely adopted for learning representations from unlabeled data; however, it is known to be suitable for large models with enough capacity and is not practical for training a small footprint FS-KWS model. Instead, we automatically annotate and filter the data to construct a keywords-like dataset, LibriWord, enabling supervision on auxiliary data. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data. Our method notably improves the performance over competitive methods in the FS-KWS benchmark.
KW - few-shot learning
KW - keyword spotting
UR - https://www.scopus.com/pages/publications/85171527604
U2 - 10.21437/Interspeech.2023-689
DO - 10.21437/Interspeech.2023-689
M3 - Conference article
AN - SCOPUS:85171527604
SN - 2308-457X
VL - 2023-August
SP - 1633
EP - 1637
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 24th International Speech Communication Association, Interspeech 2023
Y2 - 20 August 2023 through 24 August 2023
ER -