TY - GEN
T1 - Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair
AU - Kim, Misoo
AU - Kim, Youngkyoung
AU - Heo, Jinseok
AU - Jeong, Hohyeon
AU - Kim, Sungoh
AU - Lee, Eunseok
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Deep learning-based automatic program repair (DL-APR) returns a patch code when given a defect code. Recent studies on DL-APR techniques have focused on the training phase to generate more accurate patches; however, a trained model cannot always generate an accurate patch for every new defect code, as the training dataset does not completely represent the new defects to be input in the future. DL-APR researchers should study a method to elicit the best performance on new inputs from the trained and deployed model. A new defect instance (i.e., defect codes and their context codes) is one of the crucial input data that determine the accuracy of the DL-APR, which can be changed and improved. We improve the quality of new input defect instances by focusing on the presence of noise tokens which compromise the defect instances' quality, thus impairing the accuracy of generated patches. This paper shows that 1) there are noise tokens which prevent correct patch generation (inference) in a new defect instance, and 2) it is necessary to mask these noise tokens to avoid their usage in inferencing patch codes. In order to validate these two assertions, we use a state-of-the-art DL-APR technique and a genetic algorithm to generate near-optimal defect instances which maximize the patch generation accuracy (i.e., the BLEU score) of 4,573 defect instances. Based on optimization results, we found that 1) noise tokens impair patch generation accuracy in approximately 49% of instances, and 2) if these tokens are precluded from inference by masking them, we can improve patch generation accuracy by 88%. The results suggest that future work is required to automatically remove noise tokens from new defect instances so that the trained patch generator generates better patches.
AB - Deep learning-based automatic program repair (DL-APR) returns a patch code when given a defect code. Recent studies on DL-APR techniques have focused on the training phase to generate more accurate patches; however, a trained model cannot always generate an accurate patch for every new defect code, as the training dataset does not completely represent the new defects to be input in the future. DL-APR researchers should study a method to elicit the best performance on new inputs from the trained and deployed model. A new defect instance (i.e., defect codes and their context codes) is one of the crucial input data that determine the accuracy of the DL-APR, which can be changed and improved. We improve the quality of new input defect instances by focusing on the presence of noise tokens which compromise the defect instances' quality, thus impairing the accuracy of generated patches. This paper shows that 1) there are noise tokens which prevent correct patch generation (inference) in a new defect instance, and 2) it is necessary to mask these noise tokens to avoid their usage in inferencing patch codes. In order to validate these two assertions, we use a state-of-the-art DL-APR technique and a genetic algorithm to generate near-optimal defect instances which maximize the patch generation accuracy (i.e., the BLEU score) of 4,573 defect instances. Based on optimization results, we found that 1) noise tokens impair patch generation accuracy in approximately 49% of instances, and 2) if these tokens are precluded from inference by masking them, we can improve patch generation accuracy by 88%. The results suggest that future work is required to automatically remove noise tokens from new defect instances so that the trained patch generator generates better patches.
KW - Automatic program repair
KW - Deep learning
KW - Masking
KW - Noise token
KW - Optimization
UR - https://www.scopus.com/pages/publications/85146253135
U2 - 10.1109/ICSME55016.2022.00051
DO - 10.1109/ICSME55016.2022.00051
M3 - Conference contribution
AN - SCOPUS:85146253135
T3 - Proceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
SP - 419
EP - 423
BT - Proceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 39th IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
Y2 - 2 October 2022 through 7 October 2022
ER -