TY - GEN
T1 - D-vlog
T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022
AU - Yoon, Jeewoo
AU - Kang, Chaewon
AU - Kim, Seungbae
AU - Han, Jinyoung
N1 - Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2022/6/30
Y1 - 2022/6/30
N2 - Detecting depression based on non-verbal behaviors has received great attention. However, most prior work on detecting depression mainly focused on detecting depressed individuals in laboratory settings, which are difficult to be generalized in practice. In addition, little attention has been paid to analyzing the non-verbal behaviors of depressed individuals in the wild. Therefore, in this paper, we present a multimodal depression dataset, D-Vlog, which consists of 961 vlogs (i.e., around 160 hours) collected from YouTube, which can be utilized in developing depression detection models based on the non-verbal behavior of individuals in real-world scenario. We develop a multimodal deep learning model that uses acoustic and visual features extracted from collected data to detect depression. Our proposed model employs the cross-attention mechanism to effectively capture the relationship across acoustic and visual features, and generates useful multimodal representations for depression detection. The extensive experimental results demonstrate that the proposed model significantly outperforms other baseline models. We believe our dataset and the proposed model are useful for analyzing and detecting depressed individuals based on nonverbal behavior.
AB - Detecting depression based on non-verbal behaviors has received great attention. However, most prior work on detecting depression mainly focused on detecting depressed individuals in laboratory settings, which are difficult to be generalized in practice. In addition, little attention has been paid to analyzing the non-verbal behaviors of depressed individuals in the wild. Therefore, in this paper, we present a multimodal depression dataset, D-Vlog, which consists of 961 vlogs (i.e., around 160 hours) collected from YouTube, which can be utilized in developing depression detection models based on the non-verbal behavior of individuals in real-world scenario. We develop a multimodal deep learning model that uses acoustic and visual features extracted from collected data to detect depression. Our proposed model employs the cross-attention mechanism to effectively capture the relationship across acoustic and visual features, and generates useful multimodal representations for depression detection. The extensive experimental results demonstrate that the proposed model significantly outperforms other baseline models. We believe our dataset and the proposed model are useful for analyzing and detecting depressed individuals based on nonverbal behavior.
UR - https://www.scopus.com/pages/publications/85147606320
U2 - 10.1609/aaai.v36i11.21483
DO - 10.1609/aaai.v36i11.21483
M3 - Conference contribution
AN - SCOPUS:85147606320
T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
SP - 12226
EP - 12234
BT - IAAI-22, EAAI-22, AAAI-22 Special Programs and Special Track, Student Papers and Demonstrations
PB - Association for the Advancement of Artificial Intelligence
Y2 - 22 February 2022 through 1 March 2022
ER -