Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations

  • Sangmin Lee
  • , Bolin Lai
  • , Fiona Ryan
  • , Bikram Boote
  • , James M. Rehg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Understanding social interactions involving both verbal and non-verbal cues is essential for effectively interpreting social situations. However, most prior works on multimodal social cues focus predominantly on single-person behav-iors or rely on holistic visual representations that are not aligned to utterances in multi-party environments. Conse-quently, they are limited in modeling the intricate dynam-ics of multi-party interactions. In this paper, we introduce three new challenging tasks to model the fine-grained dy-namics between multiple people: speaking target identification, pronoun coreference resolution, and mentioned player prediction. We contribute extensive data annotations to cu-rate these new challenges in social deduction game settings. Furthermore, we propose a novel multimodal baseline that leverages densely aligned language-visual representations by synchronizing visual features with their corresponding utterances. This facilitates concurrently capturing verbal and non-verbal cues pertinent to social reasoning. Exper-iments demonstrate the effectiveness of the proposed approach with densely aligned multimodal representations in modeling fine-grained social interactions. Project website: https://sangmin-git.github.iolprojectslMMSI.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PublisherIEEE Computer Society
Pages14585-14595
Number of pages11
ISBN (Electronic)9798350353006
DOIs
StatePublished - 2024
Externally publishedYes
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/TerritoryUnited States
CitySeattle
Period16/06/2422/06/24

Fingerprint

Dive into the research topics of 'Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations'. Together they form a unique fingerprint.

Cite this