TY - GEN
T1 - An empirical study for class imbalance in extreme multi-label text classification
AU - Han, Sangwoo
AU - Lim, Chan
AU - Cha, Bonggeon
AU - Lee, Jongwuk
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/1
Y1 - 2021/1
N2 - Extreme multi-label text classification (XMTC) is the problem of finding the most relevant multi-labels from a text corpus with millions of labels. One of the key challenges in XMTC is that most labels appear only a few times, i.e., the class imbalance issue. To overcome the class imbalance problem, existing studies suggested various methods using different loss functions (i.e., focal loss function) and data augmentation (i.e., mix-up). In this paper, we investigate the effectiveness of two main approaches over the RNN-based and transformer-based deep XMTC models. In experimental results, we found that some improvement can be achieved when focal loss and mix-up are applied for deep XMTC models on various datasets.
AB - Extreme multi-label text classification (XMTC) is the problem of finding the most relevant multi-labels from a text corpus with millions of labels. One of the key challenges in XMTC is that most labels appear only a few times, i.e., the class imbalance issue. To overcome the class imbalance problem, existing studies suggested various methods using different loss functions (i.e., focal loss function) and data augmentation (i.e., mix-up). In this paper, we investigate the effectiveness of two main approaches over the RNN-based and transformer-based deep XMTC models. In experimental results, we found that some improvement can be achieved when focal loss and mix-up are applied for deep XMTC models on various datasets.
UR - https://www.scopus.com/pages/publications/85102968222
U2 - 10.1109/BigComp51126.2021.00073
DO - 10.1109/BigComp51126.2021.00073
M3 - Conference contribution
AN - SCOPUS:85102968222
T3 - Proceedings - 2021 IEEE International Conference on Big Data and Smart Computing, BigComp 2021
SP - 338
EP - 341
BT - Proceedings - 2021 IEEE International Conference on Big Data and Smart Computing, BigComp 2021
A2 - Unger, Herwig
A2 - Kim, Jinho
A2 - Kang, U
A2 - So-In, Chakchai
A2 - Du, Junping
A2 - Saad, Walid
A2 - Ha, Young-guk
A2 - Wagner, Christian
A2 - Bourgeois, Julien
A2 - Sathitwiriyawong, Chanboon
A2 - Kwon, Hyuk-Yoon
A2 - Leung, Carson
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Big Data and Smart Computing, BigComp 2021
Y2 - 17 January 2021 through 20 January 2021
ER -