Abstract
Beamforming is an essential tool for speaker selection and rejection of environmental noise in automatic speech recognition. This work harnesses the efficiency of delay-and-sum (DAS) beamforming by combining it with constant-directivity beamforming (CDB) and frequency-domain feature extraction. CDB facilitates DAS by restricting the bandwidth for different microphone configurations. An array of sigma-delta modulators (SDMs) digitizes eight microphone inputs. The design takes advantage of bitstream processing of the modulator outputs for beamforming and extracting 60 Mel spectrum power features. The prototype device is fabricated in the 40-nm CMOS and occupies 1.1 mm2. Each SDM consumes 91 mW and has a measured signal-to-noise and distortion ratio of 84 dB for an 8-kHz bandwidth. The beamformer and feature extractor consume a dynamic power of 76 and 122 mW, respectively. The entire power consumption of the prototype is 3.95 mW, including leakage power. Processing the Mel spectrum outputs with a DNN, the keyword spotting accuracy in the presence of noise improves from 74% without beamforming to 93% with beamforming.
| Original language | English |
|---|---|
| Pages (from-to) | 1812-1823 |
| Number of pages | 12 |
| Journal | IEEE Journal of Solid-State Circuits |
| Volume | 57 |
| Issue number | 6 |
| DOIs | |
| State | Published - 1 Jun 2022 |
| Externally published | Yes |
Keywords
- Acoustic beamforming
- beamforming
- constant-directivity beamforming (CDB)
- keyword spotting
- sigmaa delta beamforming
- speech recognition