Automatically extracting parallel sentences from wikipedia using sequential matching of language resources

Juryong Cheon, Youngjoong Ko

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In this paper, we propose a method to find similar sentences based on language resources for building a parallel corpus between English and Korean from Wikipedia. We use a Wiki-dictionary consisted of document titles from theWikipedia and bilingual example sentence pairs from Web dictionary instead of traditional machine readable dictionary. In this way, we perform similarity calculation between sentences using sequential matching of the language resources, and evaluate the extracted parallel sentences. In the experiments, the proposed parallel sentences extraction method finally shows 65.4% of F1-score.

Original languageEnglish
Pages (from-to)405-408
Number of pages4
JournalIEICE Transactions on Information and Systems
VolumeE100D
Issue number2
DOIs
StatePublished - Feb 2017
Externally publishedYes

Keywords

  • Automatic parallel corpus construction
  • Language resources
  • Sentence similarity calculation
  • Wikipedia

Fingerprint

Dive into the research topics of 'Automatically extracting parallel sentences from wikipedia using sequential matching of language resources'. Together they form a unique fingerprint.

Cite this