Abstract
Most deep-learning models of automatic speech recognition are complex and require large amounts of data for performance improvement. In the case of the XXL version of the Conformer-based model, performance is limited by the amount of data and a large parametric space (1 billion parameters), which requires significant computational resources. However, hardware and data are often inadequate for training this model; thus, it is necessary to identify performance improvement avenues notwithstanding limited resources. To this end, we propose a method for improved preprocessing, allowing to effectively extract the input data features. We present a method for strengthening the frequency of the vowel region by characterizing input voice data. The method was evaluated on the compact version of the Conformer model with 10 million parameters, the smallest among the existing Conformer models. Character error rates on the test clean dataset evaluation decreased by approximately 0.3% for the LibriSpeech 100-h-long dataset and 4.6% for the LibriSpeech 960-h-long dataset. In addition, for the LibriSpeech 100-h-long dataset, an improvement of 1.2% was obtained for the model in which the classification criterion was changed to sub-words, while an improvement of 0.6% was obtained for the sub-word and LibriSpeech 960-h-long dataset. These results show that input data preprocessing improves the performance of speech recognition models. The results of this study are reported only for the small Conformer model owing to hardware limitations, which in turn limits performance improvement. However, even with these limitations, the results strongly suggest that the model's performance improved owing to the input data preprocessing.
| Original language | English |
|---|---|
| Article number | 012005 |
| Journal | Journal of Physics: Conference Series |
| Volume | 3022 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2025 |
| Event | 9th International Conference on Artificial Intelligence, Automation and Control Technologies, AIACT 2025 - Hybrid, Sapporo, Japan Duration: 17 Feb 2025 → 21 Feb 2025 |