Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning has been widely adopted for learning representations from unlabeled data; however, it is known to be suitable for large models with enough capacity and is not practical for training a small footprint FS-KWS model. Instead, we automatically annotate and filter the data to construct a keywords-like dataset, LibriWord, enabling supervision on auxiliary data. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data. Our method notably improves the performance over competitive methods in the FS-KWS benchmark.

Original languageEnglish
Pages (from-to)1633-1637
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Externally publishedYes
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • few-shot learning
  • keyword spotting

Fingerprint

Dive into the research topics of 'Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data'. Together they form a unique fingerprint.

Cite this