TY - GEN
T1 - Feature combination to alleviate hubness problem of source code representation for bug localization
AU - Kim, Youngkyoung
AU - Kim, Misoo
AU - Lee, Eunseok
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - Deep learning-based bug localization (DLBL) can effectively reduce software maintenance costs. However, the inherent hub ness problem of the high-dimensional vector of the source code file used in DLBL leads to inaccurate bug localization. To solve this problem, we analyzed 10, 359 defects and found that the call graph and flow of the program can distinguish buggy files from non-buggy files, and provide functional semantic information for bug localization. Based on our observations, we propose a feature combination to alleviate the hubness problem of the source file representation by using functional semantic information. Our proposed method models the functional semantics with the call graph and program flow based on the raw abstract syntax tree. We evaluated the effectiveness of the proposed approach on 19 widely used projects and conducted an ablation study. The experimental results show that the proposed method can improve the current approaches by 12 % to 45 %, with differentiating buggy files and non-buggy files. In our ablation study, functional information shows its significance as the absence of functional semantics deteriorates performance by 8.5 %.
AB - Deep learning-based bug localization (DLBL) can effectively reduce software maintenance costs. However, the inherent hub ness problem of the high-dimensional vector of the source code file used in DLBL leads to inaccurate bug localization. To solve this problem, we analyzed 10, 359 defects and found that the call graph and flow of the program can distinguish buggy files from non-buggy files, and provide functional semantic information for bug localization. Based on our observations, we propose a feature combination to alleviate the hubness problem of the source file representation by using functional semantic information. Our proposed method models the functional semantics with the call graph and program flow based on the raw abstract syntax tree. We evaluated the effectiveness of the proposed approach on 19 widely used projects and conducted an ablation study. The experimental results show that the proposed method can improve the current approaches by 12 % to 45 %, with differentiating buggy files and non-buggy files. In our ablation study, functional information shows its significance as the absence of functional semantics deteriorates performance by 8.5 %.
KW - Abstract Syntax Tree
KW - Bug Localization
KW - Code Representation
KW - Functional Semantics
UR - https://www.scopus.com/pages/publications/85102374991
U2 - 10.1109/APSEC51365.2020.00068
DO - 10.1109/APSEC51365.2020.00068
M3 - Conference contribution
AN - SCOPUS:85102374991
T3 - Proceedings - Asia-Pacific Software Engineering Conference, APSEC
SP - 511
EP - 512
BT - Proceedings - 2020 27th Asia-Pacific Software Engineering Conference, APSEC 2020
PB - IEEE Computer Society
T2 - 27th Asia-Pacific Software Engineering Conference, APSEC 2020
Y2 - 1 December 2020 through 4 December 2020
ER -