TY - GEN
T1 - SER-Fuse
T2 - 12th International Symposium on Information and Communication Technology, SOICT 2023
AU - Pham, Nhat Truong
AU - Phan, Le Thi
AU - Dang, Duc Ngoc Minh
AU - Manavalan, Balachandran
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/12/7
Y1 - 2023/12/7
N2 - Speech emotion recognition (SER) is a crucial aspect of affective computing and human-computer interaction, yet effectively identifying emotions in different speakers and languages remains challenging. This paper introduces SER-Fuse, a multi-modal SER application that is designed to address the complexities of multiple speakers and languages. Our approach leverages diverse audio/speech embeddings and text embeddings to extract optimal features for multi-modal SER. We subsequently employ multi-feature fusion to integrate embedding features across modalities and languages. Experimental results archived on the English-Chinese emotional speech (ECES) dataset reveal that SER-Fuse attains competitive performance in the multi-lingual approach compared to the single-lingual approaches. Furthermore, we provide the implementation of SER-Fuse for download at https://github.com/nhattruongpham/SER-Fuse to support reproducibility and local deployment.
AB - Speech emotion recognition (SER) is a crucial aspect of affective computing and human-computer interaction, yet effectively identifying emotions in different speakers and languages remains challenging. This paper introduces SER-Fuse, a multi-modal SER application that is designed to address the complexities of multiple speakers and languages. Our approach leverages diverse audio/speech embeddings and text embeddings to extract optimal features for multi-modal SER. We subsequently employ multi-feature fusion to integrate embedding features across modalities and languages. Experimental results archived on the English-Chinese emotional speech (ECES) dataset reveal that SER-Fuse attains competitive performance in the multi-lingual approach compared to the single-lingual approaches. Furthermore, we provide the implementation of SER-Fuse for download at https://github.com/nhattruongpham/SER-Fuse to support reproducibility and local deployment.
KW - Affective computing
KW - human-computer interaction
KW - multi-feature fusion
KW - multi-lingual analysis
KW - multi-modal analysis
KW - speech emotion recognition
UR - https://www.scopus.com/pages/publications/85180552794
U2 - 10.1145/3628797.3628887
DO - 10.1145/3628797.3628887
M3 - Conference contribution
AN - SCOPUS:85180552794
T3 - ACM International Conference Proceeding Series
SP - 870
EP - 877
BT - SOICT 2023 - 12th International Symposium on Information and Communication Technology
PB - Association for Computing Machinery
Y2 - 7 December 2023 through 8 December 2023
ER -