TY - JOUR
T1 - Knowledge distillation with insufficient training data for regression
AU - Kang, Myeonginn
AU - Kang, Seokho
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/6/1
Y1 - 2024/6/1
N2 - Knowledge distillation has been widely used to compress a large teacher network into a smaller student network. Conventional approaches require the training dataset that was used to train the teacher network. However, in many real-world situations, the original training dataset is not fully-reusable owing to practical constraints, such as data security, privacy, and storage limits. In this study, we present a teacher–student matching method to improve knowledge distillation under data insufficiency for regression problems. Given an existing knowledge distillation method as the base, we introduce three additional learning objectives to make the student better emulate the prediction capability of the teacher: perturbation-based matching (PM), adversarial belief matching (ABM), and gradient matching (GM). PM is for matching the predictions of the teacher and student on synthetic data points created by perturbing original data points. ABM is for matching the predictions of the teacher and student on which the teacher and student make different predictions. GM is for matching the gradients of the teacher and student on the original and synthetic data points. We demonstrate that the proposed method improves the prediction performance of the student network, particularly when only a small part of the original training dataset is available for use. When 10% of the original training dataset is used for knowledge distillation, the root mean squared error of the student network is reduced by 43.91% on average compared with existing knowledge distillation methods.
AB - Knowledge distillation has been widely used to compress a large teacher network into a smaller student network. Conventional approaches require the training dataset that was used to train the teacher network. However, in many real-world situations, the original training dataset is not fully-reusable owing to practical constraints, such as data security, privacy, and storage limits. In this study, we present a teacher–student matching method to improve knowledge distillation under data insufficiency for regression problems. Given an existing knowledge distillation method as the base, we introduce three additional learning objectives to make the student better emulate the prediction capability of the teacher: perturbation-based matching (PM), adversarial belief matching (ABM), and gradient matching (GM). PM is for matching the predictions of the teacher and student on synthetic data points created by perturbing original data points. ABM is for matching the predictions of the teacher and student on which the teacher and student make different predictions. GM is for matching the gradients of the teacher and student on the original and synthetic data points. We demonstrate that the proposed method improves the prediction performance of the student network, particularly when only a small part of the original training dataset is available for use. When 10% of the original training dataset is used for knowledge distillation, the root mean squared error of the student network is reduced by 43.91% on average compared with existing knowledge distillation methods.
KW - Data insufficiency
KW - Knowledge distillation
KW - Neural network
KW - Regression
UR - https://www.scopus.com/pages/publications/85184073577
U2 - 10.1016/j.engappai.2024.108001
DO - 10.1016/j.engappai.2024.108001
M3 - Article
AN - SCOPUS:85184073577
SN - 0952-1976
VL - 132
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 108001
ER -