TY - GEN
T1 - Collaborative distillation for top-N recommendation
AU - Lee, Jae Woong
AU - Choi, Minjin
AU - Lee, Jongwuk
AU - Shim, Hyunjung
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Knowledge distillation (KD) is a well-known method to reduce inference latency by compressing a cumbersome teacher model to a small student model. Despite the success of KD in the classification task, applying KD to recommender models is challenging due to the sparsity of positive feedback, the ambiguity of missing feedback, and the ranking problem associated with the top-N recommendation. To address the issues, we propose a new KD model for the collaborative filtering approach, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for the top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called the teacher-and the student-guided training methods, selecting the most useful feedback from the teacher model. Via experimental results, we demonstrate that the proposed model outperforms the state-of-the-art method by 5.5-29.7% and 4.8-27.8% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, the proposed model achieves the performance comparable to the teacher model.
AB - Knowledge distillation (KD) is a well-known method to reduce inference latency by compressing a cumbersome teacher model to a small student model. Despite the success of KD in the classification task, applying KD to recommender models is challenging due to the sparsity of positive feedback, the ambiguity of missing feedback, and the ranking problem associated with the top-N recommendation. To address the issues, we propose a new KD model for the collaborative filtering approach, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for the top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called the teacher-and the student-guided training methods, selecting the most useful feedback from the teacher model. Via experimental results, we demonstrate that the proposed model outperforms the state-of-the-art method by 5.5-29.7% and 4.8-27.8% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, the proposed model achieves the performance comparable to the teacher model.
KW - Collaborative filtering
KW - Knowledge distillation
KW - Recommender systems
UR - https://www.scopus.com/pages/publications/85078912282
U2 - 10.1109/ICDM.2019.00047
DO - 10.1109/ICDM.2019.00047
M3 - Conference contribution
AN - SCOPUS:85078912282
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 369
EP - 378
BT - Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019
A2 - Wang, Jianyong
A2 - Shim, Kyuseok
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th IEEE International Conference on Data Mining, ICDM 2019
Y2 - 8 November 2019 through 11 November 2019
ER -