Abstract
This paper proposes the use of syllable distribution patterns as deep learning inputs for morphological analysis. The proposed syllable distribution pattern comprises two parts: a distributed syllable embedding vector and a morpheme syllable-level distribution pattern. As a learning method, we utilize bidirectional long short-term memory with a conditional random field layer (Bi-LSTM-CRF) for Korean part-of-speech tagging tasks. After syllable-level outputs are generated by Bi-LSTM-CRF, a morpheme restoration process is performed utilizing pre-analyzed dictionaries that were automatically created from a training corpus. Experimental results reveal outstanding performance for the proposed method with an F1-score of 98.65%.
| Original language | English |
|---|---|
| Pages (from-to) | 39-45 |
| Number of pages | 7 |
| Journal | Pattern Recognition Letters |
| Volume | 120 |
| DOIs | |
| State | Published - 1 Apr 2019 |
| Externally published | Yes |
Keywords
- Bi-LSTM-CRF
- Morpheme distribution
- Morphological analysis
- POS tagging
- Syllable distribution pattern
- Syllable embedding