Enhancing Speech Recognition: Vowel Feature Extraction and Its Influence on Conformer Model Efficacy

Hyunsu Jang, Jaekwang Kim

Research output: Contribution to journalConference articlepeer-review

Abstract

Most deep-learning models of automatic speech recognition are complex and require large amounts of data for performance improvement. In the case of the XXL version of the Conformer-based model, performance is limited by the amount of data and a large parametric space (1 billion parameters), which requires significant computational resources. However, hardware and data are often inadequate for training this model; thus, it is necessary to identify performance improvement avenues notwithstanding limited resources. To this end, we propose a method for improved preprocessing, allowing to effectively extract the input data features. We present a method for strengthening the frequency of the vowel region by characterizing input voice data. The method was evaluated on the compact version of the Conformer model with 10 million parameters, the smallest among the existing Conformer models. Character error rates on the test clean dataset evaluation decreased by approximately 0.3% for the LibriSpeech 100-h-long dataset and 4.6% for the LibriSpeech 960-h-long dataset. In addition, for the LibriSpeech 100-h-long dataset, an improvement of 1.2% was obtained for the model in which the classification criterion was changed to sub-words, while an improvement of 0.6% was obtained for the sub-word and LibriSpeech 960-h-long dataset. These results show that input data preprocessing improves the performance of speech recognition models. The results of this study are reported only for the small Conformer model owing to hardware limitations, which in turn limits performance improvement. However, even with these limitations, the results strongly suggest that the model's performance improved owing to the input data preprocessing.

Original languageEnglish
Article number012005
JournalJournal of Physics: Conference Series
Volume3022
Issue number1
DOIs
StatePublished - 2025
Event9th International Conference on Artificial Intelligence, Automation and Control Technologies, AIACT 2025 - Hybrid, Sapporo, Japan
Duration: 17 Feb 202521 Feb 2025

Fingerprint

Dive into the research topics of 'Enhancing Speech Recognition: Vowel Feature Extraction and Its Influence on Conformer Model Efficacy'. Together they form a unique fingerprint.

Cite this