SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

  • Eunseong Choi
  • , Sunkyung Lee
  • , Minijn Choi
  • , Hyeseon Ko
  • , Young In Song
  • , Jongwuk Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

Sparse document representations have been widely used to retrieve relevant documents via exact lexical matching. Owing to the pre-computed inverted index, it supports fast ad-hoc search but incurs the vocabulary mismatch problem. Although recent neural ranking models using pre-trained language models can address this problem, they usually require expensive query inference costs, implying the trade-off between effectiveness and efficiency. Tackling the trade-off, we propose a novel uni-encoder ranking model, Sparse retriever using a Dual document Encoder (SpaDE), learning document representation via the dual encoder. Each encoder plays a central role in (i) adjusting the importance of terms to improve lexical matching and (ii) expanding additional terms to support semantic matching. Furthermore, our co-training strategy trains the dual encoder effectively and avoids unnecessary intervention in training each other. Experimental results on several benchmarks show that SpaDE outperforms existing uni-encoder ranking models.

Original languageEnglish
Title of host publicationCIKM 2022 - Proceedings of the 31st ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages272-282
Number of pages11
ISBN (Electronic)9781450392365
DOIs
StatePublished - 17 Oct 2022
Event31st ACM International Conference on Information and Knowledge Management, CIKM 2022 - Atlanta, United States
Duration: 17 Oct 202221 Oct 2022

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference31st ACM International Conference on Information and Knowledge Management, CIKM 2022
Country/TerritoryUnited States
CityAtlanta
Period17/10/2221/10/22

Keywords

  • neural ranking
  • pre-trained language model
  • sparse representations

Fingerprint

Dive into the research topics of 'SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval'. Together they form a unique fingerprint.

Cite this