TY - GEN
T1 - Multi-document Summarization by Creating Synthetic Document Vector Based on Language Model
AU - Kim, Dahae
AU - Lee, Jee Hyong
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/28
Y1 - 2016/12/28
N2 - Multi-document summarization is to create summaries covering the major information that multiple documents tell in common. For this point, the existing methods are based on hand-crafted features for word and sentence. However, it is difficult to figure out the core contents of each document with the hand-crafted features because they have the limited information presented the given documents. Moreover, there exists a limit to figure out the major information because documents with the same meaning used to be paraphrased depending on their writers. Therefore, it is necessary to represent the semantic meanings of documents as well as sentences through understanding natural language. In this paper, we propose a new multi-document summarization system by creating a synthetic document vector covering the whole documents based on Language Model, whose is well-known for learning the semantic features in text. We experimented with DUC 2004 dataset provided by Document Understanding Conference (DUC) and the results show that our method summarizes multiple documents effectively based on their core contents.
AB - Multi-document summarization is to create summaries covering the major information that multiple documents tell in common. For this point, the existing methods are based on hand-crafted features for word and sentence. However, it is difficult to figure out the core contents of each document with the hand-crafted features because they have the limited information presented the given documents. Moreover, there exists a limit to figure out the major information because documents with the same meaning used to be paraphrased depending on their writers. Therefore, it is necessary to represent the semantic meanings of documents as well as sentences through understanding natural language. In this paper, we propose a new multi-document summarization system by creating a synthetic document vector covering the whole documents based on Language Model, whose is well-known for learning the semantic features in text. We experimented with DUC 2004 dataset provided by Document Understanding Conference (DUC) and the results show that our method summarizes multiple documents effectively based on their core contents.
KW - Core content
KW - Language model
KW - Major Information
KW - Multi-document summarization
KW - Synthetic document vector
UR - https://www.scopus.com/pages/publications/85010465120
U2 - 10.1109/SCIS-ISIS.2016.0132
DO - 10.1109/SCIS-ISIS.2016.0132
M3 - Conference contribution
AN - SCOPUS:85010465120
T3 - Proceedings - 2016 Joint 8th International Conference on Soft Computing and Intelligent Systems and 2016 17th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2016
SP - 605
EP - 609
BT - Proceedings - 2016 Joint 8th International Conference on Soft Computing and Intelligent Systems and 2016 17th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th Joint International Conference on Soft Computing and Intelligent Systems and 17th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2016
Y2 - 25 August 2016 through 28 August 2016
ER -