SER-Fuse: An Emotion Recognition Application Utilizing Multi-Modal, Multi-Lingual, and Multi-Feature Fusion

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Speech emotion recognition (SER) is a crucial aspect of affective computing and human-computer interaction, yet effectively identifying emotions in different speakers and languages remains challenging. This paper introduces SER-Fuse, a multi-modal SER application that is designed to address the complexities of multiple speakers and languages. Our approach leverages diverse audio/speech embeddings and text embeddings to extract optimal features for multi-modal SER. We subsequently employ multi-feature fusion to integrate embedding features across modalities and languages. Experimental results archived on the English-Chinese emotional speech (ECES) dataset reveal that SER-Fuse attains competitive performance in the multi-lingual approach compared to the single-lingual approaches. Furthermore, we provide the implementation of SER-Fuse for download at https://github.com/nhattruongpham/SER-Fuse to support reproducibility and local deployment.

Original languageEnglish
Title of host publicationSOICT 2023 - 12th International Symposium on Information and Communication Technology
PublisherAssociation for Computing Machinery
Pages870-877
Number of pages8
ISBN (Electronic)9798400708916
DOIs
StatePublished - 7 Dec 2023
Event12th International Symposium on Information and Communication Technology, SOICT 2023 - Ho Chi Minh City, Viet Nam
Duration: 7 Dec 20238 Dec 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference12th International Symposium on Information and Communication Technology, SOICT 2023
Country/TerritoryViet Nam
CityHo Chi Minh City
Period7/12/238/12/23

Keywords

  • Affective computing
  • human-computer interaction
  • multi-feature fusion
  • multi-lingual analysis
  • multi-modal analysis
  • speech emotion recognition

Fingerprint

Dive into the research topics of 'SER-Fuse: An Emotion Recognition Application Utilizing Multi-Modal, Multi-Lingual, and Multi-Feature Fusion'. Together they form a unique fingerprint.

Cite this