A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Biomedical Named Entity (NE) recognition is a core technique for various works in the biomedical domain. In previous studies, using machine learning algorithm shows better performance than dictionary-based and rule based approaches because there are too many terminological variations of biomedical NEs and new biomedical NEs are constantly generated. To achieve the high performance with a machine-learning algorithm, good-quality corpora are required. However, it is difficult to obtain the good-quality corpora because annotating a biomedical corpus for machine-learning is extremely time-consuming and costly. In addition, most previous corpora are insufficient for high-level tasks because they cannot cover various domains. Therefore, we propose a method for generating a large amount of machine-labeled data that covers various domains. To generate a large amount of machine-labeled data, firstly we generate an initial machine-labeled data by using a chunker and MetaMap. The chunker is developed to extract only biomedical NEs with manually annotated data. MetaMap is used to annotate the category of biomedical NE. Then we apply the self-training approach to bootstrap the performance of initial machine-labeled data. In our experiments, the biomedical NE recognition system that is trained with our proposed machine-labeled data achieves much high performance. As a result, our system outperforms biomedical NE recognition system that using MetaMap only with 26.03%p improvements on F1-score.

Original languageEnglish
Title of host publicationDDDSM 2017 - 1st International Workshop on Digital Disease Detection using Social Media, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages47-51
Number of pages5
ISBN (Electronic)9781948087070
StatePublished - 2017
Externally publishedYes
Event1st International Workshop on Digital Disease Detection using Social Media, DDDSM 2017, co-located with the 8th International Joint Conference on Natural Language Processing, IJCNLP 2017 - Taipei, Taiwan, Province of China
Duration: 27 Nov 2017 → …

Publication series

NameDDDSM 2017 - 1st International Workshop on Digital Disease Detection using Social Media, Proceedings of the Workshop

Conference

Conference1st International Workshop on Digital Disease Detection using Social Media, DDDSM 2017, co-located with the 8th International Joint Conference on Natural Language Processing, IJCNLP 2017
Country/TerritoryTaiwan, Province of China
CityTaipei
Period27/11/17 → …

Fingerprint

Dive into the research topics of 'A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains'. Together they form a unique fingerprint.

Cite this