TY - GEN
T1 - BLOCSUM
T2 - Findings of the Association for Computational Linguistics, ACL 2023
AU - Choi, Yun Seok
AU - Kim, Hyojun
AU - Lee, Jee Hyong
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Code summarization, which aims to automatically generate natural language descriptions of the source code, has become an essential task in software development for better program understanding. Abstract Syntax Tree (AST), which represents the syntax structure of the source code, is helpful when utilized together with the sequence of code tokens to improve the quality of code summaries. Recent works on code summarization attempted to capture the sequential and structural information of the source code, but they considered less the property that source code consists of multiple code blocks. In this paper, we propose BLOCSUM, BLOck scope-based source Code SUMmarization via shared block representation that utilizes block-scope information by representing various structures of the code block. We propose a shared block position embedding to effectively represent the structure of code blocks and merge both code and AST. Furthermore, we develop variant ASTs to learn rich information such as block and global dependencies of the source code. To prove our approach, we perform experiments on two real-world datasets, the Java dataset and the Python dataset. We demonstrate the effectiveness of BLOCSUM through various experiments, including ablation studies and a human evaluation.
AB - Code summarization, which aims to automatically generate natural language descriptions of the source code, has become an essential task in software development for better program understanding. Abstract Syntax Tree (AST), which represents the syntax structure of the source code, is helpful when utilized together with the sequence of code tokens to improve the quality of code summaries. Recent works on code summarization attempted to capture the sequential and structural information of the source code, but they considered less the property that source code consists of multiple code blocks. In this paper, we propose BLOCSUM, BLOck scope-based source Code SUMmarization via shared block representation that utilizes block-scope information by representing various structures of the code block. We propose a shared block position embedding to effectively represent the structure of code blocks and merge both code and AST. Furthermore, we develop variant ASTs to learn rich information such as block and global dependencies of the source code. To prove our approach, we perform experiments on two real-world datasets, the Java dataset and the Python dataset. We demonstrate the effectiveness of BLOCSUM through various experiments, including ablation studies and a human evaluation.
UR - https://www.scopus.com/pages/publications/85175461982
U2 - 10.18653/v1/2023.findings-acl.724
DO - 10.18653/v1/2023.findings-acl.724
M3 - Conference contribution
AN - SCOPUS:85175461982
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 11427
EP - 11441
BT - Findings of the Association for Computational Linguistics, ACL 2023
PB - Association for Computational Linguistics (ACL)
Y2 - 9 July 2023 through 14 July 2023
ER -