Abstract
This paper proposes an effective set expansion system that can automatically extract named entities (NEs) from the Web to construct NE domain dictionaries. The purpose of this set expansion system is to expand a given partial set of objects into a more complete set. Google Sets is a representative set expansion system mat uses the Web. The proposed system uses several seed words as initial information to collect Web pages that probably contain many NEs and to extract NE candidates from the collected Web pages. A mutual-importance measurement technique is developed to estimate the importance scores of the NE candidates, and men, these importance scores are used for ranking these candidates. We can easily extract real NEs from an ordered list of NE candidates. As a result, the proposed method showed 95.60% mean average precision (MAP) in 7 Korean NE domains and 99.98% MAP in 8 English NE domains. In particular, the accuracy of the proposed system in the case of English domains is higher than that of Google Sets.
| Original language | English |
|---|---|
| Pages (from-to) | 5029-5040 |
| Number of pages | 12 |
| Journal | Information |
| Volume | 15 |
| Issue number | 11 B |
| State | Published - Nov 2012 |
| Externally published | Yes |
Keywords
- Mutual importance measurement (MIM)
- Named entity
- Named entity recognition
- Set expansion
Fingerprint
Dive into the research topics of 'Automatic named-entity set expansion from the Web using a mutual importance measure'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver