Perceive before Respond: Improving Sticker Response Selection by Emotion Distillation and Hard Mining

  • Wuyou Xia
  • , Shengzhe Liu
  • , Qin Rong
  • , Guoli Jia
  • , Eunil Park
  • , Jufeng Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

In online chatting, people increasingly prefer using stickers to supplement or replace text for replies, as sticker images can express vivid and varied emotions. The Sticker Response Selection (SRS) task aims to predict the sticker image that is most relevant to the history dialogue. Previous researches explore the semantic similarity between context and stickers, overlooking both unimodal and cross-modal emotional information. In this paper, we propose a 'Perceive before Respond' (PBR) training paradigm. PBR perceives sticker emotions through a knowledge distillation module. Variety representations of each emotion category are acquired from the large-scale sticker emotion recognition dataset and distilled into our model to enhance emotion comprehension. We further distinguish stickers with similar subject elements under the same topic. We perform contrastive learning at both inter- and intra-topic levels to obtain discriminative and diverse sticker representations. In addition, we improve the hard negative sampling method for image-text matching based on cross-modal sentiment association, conducting hard sample mining from both semantic similarity and sentiment polarity similarity for sticker-to-dialogue and dialogue-to-sticker. Extensive experiments verify the effectiveness of each proposed component. Ablation experiments on different backbone networks demonstrate the generality of our approach. Our code is released on https://github.com/wuyou-xia/Perceive-before-Respond.

Original languageEnglish
Title of host publicationMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages9631-9640
Number of pages10
ISBN (Electronic)9798400706868
DOIs
StatePublished - 28 Oct 2024
Event32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024

Publication series

NameMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference32nd ACM International Conference on Multimedia, MM 2024
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24

Keywords

  • multimodal learning
  • sticker response selection

Fingerprint

Dive into the research topics of 'Perceive before Respond: Improving Sticker Response Selection by Emotion Distillation and Hard Mining'. Together they form a unique fingerprint.

Cite this