Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems

Download
2013
Tatlıcıoğlu, Mehmet
Machine translation is the process of translating texts from a natural language to another by computers based on linguistic motivations, statistical approaches, or the combination of them. In this study, the frequency-driven late fusion-based word decomposition approach is introduced to improve the translation quality of the phrase-based statistical machine translation system from Turkish to English. This late fusion-based approach is compared with the standalone statistical and rule-based word decomposition approaches when the corpus size changes. This study differs from others by introducing the novel frequency-driven late fusion-based word decomposition method to boost the BLEU score. While the benchmark study in the literature reports a 25.22 BLEU score, the proposed late fusion-based system boosts the accuracy up to a 26.22 BLEU score. This novel approach fuses both of the rule-based and stochastic word decomposition methods. Because of the agglutinative nature of Turkish language, the results can be extended to the other agglutinative languages as well.
Citation Formats
M. Tatlıcıoğlu, “Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems,” M.S. - Master of Science, Middle East Technical University, 2013.