TY - JOUR
T1 - Robust fine-tuning for low-resource NLP
T2 - Combining adversarial and metric-based learning to mitigate overfitting
AU - Choi, Kyuri
AU - Ko, Youngjoong
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/9/1
Y1 - 2025/9/1
N2 - Learning from low-resource training samples is challenging since the model can memorize features that are irrelevant to the given task, commonly known as overfitting. As neural networks grow in size due to their enhanced effectiveness, they increasingly face the risk of overfitting, especially when trained on limited data. This situation where larger models are both more capable but more prone to overfitting necessitates novel strategies to maintain generalization. Existing solutions often rely on preserving pre-trained model weights to prevent overfitting while harnessing rich information from the pre-training phase. However, these approaches encounter performance trade-offs between in-distribution and out-of-distribution and lack analysis of how pre-trained language models (PLMs) overfit training data. In this work, we analyze the tendency of PLMs to overfit salient features within the constrained data distribution, especially domain-specific features. Motivated by this observation, we propose a training method that reduces the influence of domain information in the embedding space to prevent overfitting on specific feature when working with low-resource samples. Our approach demonstrates promising improvements in diverse out-of-distribution settings while maintaining comparable performance on in-distribution test sets.
AB - Learning from low-resource training samples is challenging since the model can memorize features that are irrelevant to the given task, commonly known as overfitting. As neural networks grow in size due to their enhanced effectiveness, they increasingly face the risk of overfitting, especially when trained on limited data. This situation where larger models are both more capable but more prone to overfitting necessitates novel strategies to maintain generalization. Existing solutions often rely on preserving pre-trained model weights to prevent overfitting while harnessing rich information from the pre-training phase. However, these approaches encounter performance trade-offs between in-distribution and out-of-distribution and lack analysis of how pre-trained language models (PLMs) overfit training data. In this work, we analyze the tendency of PLMs to overfit salient features within the constrained data distribution, especially domain-specific features. Motivated by this observation, we propose a training method that reduces the influence of domain information in the embedding space to prevent overfitting on specific feature when working with low-resource samples. Our approach demonstrates promising improvements in diverse out-of-distribution settings while maintaining comparable performance on in-distribution test sets.
KW - Adversarial learning
KW - Low-resource classification
KW - Metric-based learning
KW - Overfitting
KW - Pre-trained language models
UR - https://www.scopus.com/pages/publications/105006679877
U2 - 10.1016/j.eswa.2025.127737
DO - 10.1016/j.eswa.2025.127737
M3 - Article
AN - SCOPUS:105006679877
SN - 0957-4174
VL - 288
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 127737
ER -