Self-supervised Fine-tuning for Efficient Passage Re-ranking

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Passage retrievers based on neural language models have recently achieved significant performance improvements in ranking tasks. Such ranking models have the advantage of finding the contextual features of queries and documents better than traditional keyword based methods. However, these deep learning-based models are limited by the large amounts of training data required. We propose a new fine-tuning method based on a masked language model (MLM) that is typically used in pre-trained language models. Our model improves the ranking performance using the MLM while efficiently utilizing less training data via data augmentation. The proposed approach applies self-supervised learning to information retrieval without needing additional expensive labeled data. In addition, because masking important terms during the fine-tuning stage can undermine ranking performance, the importance values of each term and sentence in a passage are calculated using the BM25 scheme and applied to the fine-tuning task such that the more important terms are masked less often. Our model is trained with dataset from MS MARCO re-ranking leaderboard and achieves the state-of-the-art MRR@10 performance in the leaderboard except for the ensemble-based method.

Original languageEnglish
Title of host publicationCIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages3142-3146
Number of pages5
ISBN (Electronic)9781450384469
DOIs
StatePublished - 30 Oct 2021
Event30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia
Duration: 1 Nov 20215 Nov 2021

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Country/TerritoryAustralia
CityVirtual, Online
Period1/11/215/11/21

Keywords

  • passage ranking
  • pre-trained language model
  • self-supervised learning

Fingerprint

Dive into the research topics of 'Self-supervised Fine-tuning for Efficient Passage Re-ranking'. Together they form a unique fingerprint.

Cite this