TY - JOUR
T1 - Efficient question classification and retrieval using category information and word embedding on cQA services
AU - Bae, Kyoungman
AU - Ko, Youngjoong
N1 - Publisher Copyright:
© 2019, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2019/8/15
Y1 - 2019/8/15
N2 - Classifying the task of automatically assigning unlabeled questions into predefined categories (or topics) and effectively retrieving a similar question are crucial aspects of an effective cQA service. We first address the problems associated with estimating and utilizing the distribution of words in each category of word weights. We then apply an automatic expansion word generation technique that is based on our proposed weighting method and the pseudo relevance feedback to question classification. Secondly to address the lexical gap problem in question retrieval, the case frame of the sentence is first defined using the extracted components of a sentence, and a similarity measure based on the case frame and the word embedding is then derived to determine the similarities between two sentences. These similarities are then used to reorder the results of the first retrieval model. Consequently, the proposed methods significantly improve the performance of question classification and retrieval.
AB - Classifying the task of automatically assigning unlabeled questions into predefined categories (or topics) and effectively retrieving a similar question are crucial aspects of an effective cQA service. We first address the problems associated with estimating and utilizing the distribution of words in each category of word weights. We then apply an automatic expansion word generation technique that is based on our proposed weighting method and the pseudo relevance feedback to question classification. Secondly to address the lexical gap problem in question retrieval, the case frame of the sentence is first defined using the extracted components of a sentence, and a similarity measure based on the case frame and the word embedding is then derived to determine the similarities between two sentences. These similarities are then used to reorder the results of the first retrieval model. Consequently, the proposed methods significantly improve the performance of question classification and retrieval.
KW - Category information
KW - Pseudo-relevance feedback
KW - Question classification
KW - Question expansion
KW - Word weighting method
UR - https://www.scopus.com/pages/publications/85064267413
U2 - 10.1007/s10844-019-00556-x
DO - 10.1007/s10844-019-00556-x
M3 - Article
AN - SCOPUS:85064267413
SN - 0925-9902
VL - 53
SP - 27
EP - 49
JO - Journal of Intelligent Information Systems
JF - Journal of Intelligent Information Systems
IS - 1
ER -