TY - GEN
T1 - Self-supervised Fine-tuning for Efficient Passage Re-ranking
AU - Kim, Meoungjun
AU - Ko, Youngjoong
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/30
Y1 - 2021/10/30
N2 - Passage retrievers based on neural language models have recently achieved significant performance improvements in ranking tasks. Such ranking models have the advantage of finding the contextual features of queries and documents better than traditional keyword based methods. However, these deep learning-based models are limited by the large amounts of training data required. We propose a new fine-tuning method based on a masked language model (MLM) that is typically used in pre-trained language models. Our model improves the ranking performance using the MLM while efficiently utilizing less training data via data augmentation. The proposed approach applies self-supervised learning to information retrieval without needing additional expensive labeled data. In addition, because masking important terms during the fine-tuning stage can undermine ranking performance, the importance values of each term and sentence in a passage are calculated using the BM25 scheme and applied to the fine-tuning task such that the more important terms are masked less often. Our model is trained with dataset from MS MARCO re-ranking leaderboard and achieves the state-of-the-art MRR@10 performance in the leaderboard except for the ensemble-based method.
AB - Passage retrievers based on neural language models have recently achieved significant performance improvements in ranking tasks. Such ranking models have the advantage of finding the contextual features of queries and documents better than traditional keyword based methods. However, these deep learning-based models are limited by the large amounts of training data required. We propose a new fine-tuning method based on a masked language model (MLM) that is typically used in pre-trained language models. Our model improves the ranking performance using the MLM while efficiently utilizing less training data via data augmentation. The proposed approach applies self-supervised learning to information retrieval without needing additional expensive labeled data. In addition, because masking important terms during the fine-tuning stage can undermine ranking performance, the importance values of each term and sentence in a passage are calculated using the BM25 scheme and applied to the fine-tuning task such that the more important terms are masked less often. Our model is trained with dataset from MS MARCO re-ranking leaderboard and achieves the state-of-the-art MRR@10 performance in the leaderboard except for the ensemble-based method.
KW - passage ranking
KW - pre-trained language model
KW - self-supervised learning
UR - https://www.scopus.com/pages/publications/85119173052
U2 - 10.1145/3459637.3482179
DO - 10.1145/3459637.3482179
M3 - Conference contribution
AN - SCOPUS:85119173052
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 3142
EP - 3146
BT - CIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Y2 - 1 November 2021 through 5 November 2021
ER -