TY - GEN
T1 - Feature assortment for deep learning-based bug localization with a program graph
AU - Kim, Youngkyoung
AU - Kim, Misoo
AU - Lee, Eunseok
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/4/25
Y1 - 2022/4/25
N2 - Bug localization can effectively reduce software maintenance costs. Recently, deep learning-based bug localization (DLBL) has demonstrated its effectiveness in bridging the lexical gaps between bug reports (BRs) and source code files (SFs). Deliberate feature selection that considers the unique characteristics of SFs can boost DLBL. Using the various features of SFs, we propose the following three methods to identify the features that can improve DLBL 1) text token restriction, 2) program graph construction, and 3) projection. First, the text token information of SFs is used to avoid selecting a text token that can become a noise feature. Second, we propose a five rules to construct a program graph that can supplement the textual features. Our program graph can highlight the difference between buggy and non-buggy SFs while preserving the individual characteristics of each SF and interleaved relationships of SFs. We treat the entire program of the project as a knowledge graph, whose subgraphs are SFs. Even if the features of the SFs are presented well by existing approaches, these approaches have a limitation in that they choose parts irrelevant to the bug as a feature, because the same features represent the SF for all of the different input BRs. Therefore, we propose projecting the SF feature vectors onto the BR feature vectors to highlight the BR-relative features of the SF for different BRs. We evaluated our proposed method on widely used open-source Java projects. The experimental results on 1,928 BRs from 10 Java projects showed the effectiveness of the proposed method. The proposed method can improve bug localization accuracy by an average of 34%.
AB - Bug localization can effectively reduce software maintenance costs. Recently, deep learning-based bug localization (DLBL) has demonstrated its effectiveness in bridging the lexical gaps between bug reports (BRs) and source code files (SFs). Deliberate feature selection that considers the unique characteristics of SFs can boost DLBL. Using the various features of SFs, we propose the following three methods to identify the features that can improve DLBL 1) text token restriction, 2) program graph construction, and 3) projection. First, the text token information of SFs is used to avoid selecting a text token that can become a noise feature. Second, we propose a five rules to construct a program graph that can supplement the textual features. Our program graph can highlight the difference between buggy and non-buggy SFs while preserving the individual characteristics of each SF and interleaved relationships of SFs. We treat the entire program of the project as a knowledge graph, whose subgraphs are SFs. Even if the features of the SFs are presented well by existing approaches, these approaches have a limitation in that they choose parts irrelevant to the bug as a feature, because the same features represent the SF for all of the different input BRs. Therefore, we propose projecting the SF feature vectors onto the BR feature vectors to highlight the BR-relative features of the SF for different BRs. We evaluated our proposed method on widely used open-source Java projects. The experimental results on 1,928 BRs from 10 Java projects showed the effectiveness of the proposed method. The proposed method can improve bug localization accuracy by an average of 34%.
KW - abstract syntax tree
KW - bug localization
KW - deep learning-based bug localization
KW - feature selection
KW - knowledge graph
UR - https://www.scopus.com/pages/publications/85130329663
U2 - 10.1145/3477314.3507063
DO - 10.1145/3477314.3507063
M3 - Conference contribution
AN - SCOPUS:85130329663
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 1536
EP - 1544
BT - Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC 2022
PB - Association for Computing Machinery
T2 - 37th ACM/SIGAPP Symposium on Applied Computing, SAC 2022
Y2 - 25 April 2022 through 29 April 2022
ER -