Abstract
The Situated Interactive MultiModal Conversations (SIMMC2.1) Challenge 2022 is hosted by the Eleventh Dialog System Technology Challenge (DSTC11). The task of SIMMC is to create a shopping assistant agent that can communicate with customers in a virtual store. It requires processing store scenes and product catalogs along with the customer’s request which could be decomposed into four steps and each becomes a subtask. In this work, we investigate monolithic transformers, fusion transformers, and language transformers as three distinct multimodal modeling approaches, and evaluate the potential of each. We also devise a retrieval-based method to acquire meta-data of each object which enhances the accuracy of predicted object characteristics significantly. Furthermore, we identify a discrepancy in using pretrained language models for dialog tasks and propose a simple domain-adaptation method. Our model came in third place for object coreferencing, dialog state tracking, and response generation tasks.
| Original language | English |
|---|---|
| Pages | 25-30 |
| Number of pages | 6 |
| State | Published - 2023 |
| Event | 11th Dialog System Technology Challenge, DSTC 2023 - Prague, Czech Republic Duration: 11 Sep 2023 → … |
Conference
| Conference | 11th Dialog System Technology Challenge, DSTC 2023 |
|---|---|
| Country/Territory | Czech Republic |
| City | Prague |
| Period | 11/09/23 → … |