Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair

  • Misoo Kim
  • , Youngkyoung Kim
  • , Jinseok Heo
  • , Hohyeon Jeong
  • , Sungoh Kim
  • , Eunseok Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Deep learning-based automatic program repair (DL-APR) returns a patch code when given a defect code. Recent studies on DL-APR techniques have focused on the training phase to generate more accurate patches; however, a trained model cannot always generate an accurate patch for every new defect code, as the training dataset does not completely represent the new defects to be input in the future. DL-APR researchers should study a method to elicit the best performance on new inputs from the trained and deployed model. A new defect instance (i.e., defect codes and their context codes) is one of the crucial input data that determine the accuracy of the DL-APR, which can be changed and improved. We improve the quality of new input defect instances by focusing on the presence of noise tokens which compromise the defect instances' quality, thus impairing the accuracy of generated patches. This paper shows that 1) there are noise tokens which prevent correct patch generation (inference) in a new defect instance, and 2) it is necessary to mask these noise tokens to avoid their usage in inferencing patch codes. In order to validate these two assertions, we use a state-of-the-art DL-APR technique and a genetic algorithm to generate near-optimal defect instances which maximize the patch generation accuracy (i.e., the BLEU score) of 4,573 defect instances. Based on optimization results, we found that 1) noise tokens impair patch generation accuracy in approximately 49% of instances, and 2) if these tokens are precluded from inference by masking them, we can improve patch generation accuracy by 88%. The results suggest that future work is required to automatically remove noise tokens from new defect instances so that the trained patch generator generates better patches.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages419-423
Number of pages5
ISBN (Electronic)9781665479561
DOIs
StatePublished - 2022
Event39th IEEE International Conference on Software Maintenance and Evolution, ICSME 2022 - Limassol, Cyprus
Duration: 2 Oct 20227 Oct 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022

Conference

Conference39th IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
Country/TerritoryCyprus
CityLimassol
Period2/10/227/10/22

Keywords

  • Automatic program repair
  • Deep learning
  • Masking
  • Noise token
  • Optimization

Fingerprint

Dive into the research topics of 'Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair'. Together they form a unique fingerprint.

Cite this