Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems

Download
2013
Tatlıcıoğlu, Mehmet
Machine translation is the process of translating texts from a natural language to another by computers based on linguistic motivations, statistical approaches, or the combination of them. In this study, the frequency-driven late fusion-based word decomposition approach is introduced to improve the translation quality of the phrase-based statistical machine translation system from Turkish to English. This late fusion-based approach is compared with the standalone statistical and rule-based word decomposition approaches when the corpus size changes. This study differs from others by introducing the novel frequency-driven late fusion-based word decomposition method to boost the BLEU score. While the benchmark study in the literature reports a 25.22 BLEU score, the proposed late fusion-based system boosts the accuracy up to a 26.22 BLEU score. This novel approach fuses both of the rule-based and stochastic word decomposition methods. Because of the agglutinative nature of Turkish language, the results can be extended to the other agglutinative languages as well.

Suggestions

Head finalization and morphological analysis in factored phrase-based statistical machine translation from English to Turkish
İmren, Haydar; Çakıcı, Ruket; Department of Computer Engineering (2015)
Machine Translation is a field of study which deals with translating text from one natural language to another automatically. Statistical Machine Translation generates the translations using statistical methods and bilingual text corpora. In this study, an approach for translating from English to Turkish is introduced. Turkish is an agglutinative language with a free constituent order, whereas English is not agglutinative and the constituent order is strict. Besides these differences, there is a lack of par...
Morphological processing in developing readers: a psycholinguistic study on Turkish primary school children
Uğuz, Enis; Kırkıcı, Bilal; Department of English Literature (2018)
The processing of morphologically complex words has been studies in many languages, leading to a variety of theoretical accounts. While dual-route models advocate two distinct mechanisms for word processing, single route models suggest a single mechanism. Contrasting findings as well as the different interpretations of the same results have kept the advocators of both accounts searching for a solid and undisputable justification for their views. This thesis investigated the early stages of morphological pro...
Morphological processing of inflected and derived words in L1 Turkish and L2 English
Şafak, Duygu Fatma; Kırkıcı, Bilal; Department of English Language Teaching (2015)
The present study aims at examining how inflected and derived words are processed during the early stages of visual word recognition in a native language (L1) and in a second language (L2). A second aim of the study is to find out whether or not the semantic and surface-form properties of morphologically complex words affect early word recognition processes. Two masked priming experiments were conducted to investigate morphological processing in L1 Turkish and in L2 English. In the first experiment, 40 L1 s...
An Experimental study on acquisition of prepositions in English as a third language
Çabuk, Sakine; Sağın Şimşek, Sultan Çiğdem; Gracanın Yüksek, Martına; Department of English Language Teaching (2016)
This study explores the role of cross-linguistic influence in third language acquisition process by examining English adpositions. Comprehension, processing and production of English prepositions (in, on, at, behind, over, to) were examined through off-line and on-line data collection tasks to find out which of the two known languages (L1 or L2) is the major source of cross-linguistic influence on the acquisition of English (L3) adpositions given the fact that adpositions are morphologically and syntactical...
Language modeling for Turkish continuous speech recognition
Şahin, Serkan; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2003)
This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabi...
Citation Formats
M. Tatlıcıoğlu, “Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems,” M.S. - Master of Science, Middle East Technical University, 2013.