Language modelling for Turkish as an agglutinative language

2004-04-30
Two types of language models have been considered for Turkish continuous speech recogniton. In one case words are seperated into their stems and their rest, and language models are calculated based on this new set of units. In the other case words are considered as a whole but language models are calculated with respect to the stems of the words. Studies are carried out for bi-gram and tri-gram formalisms.
IEEE 12th Signal Processing and Communications Applications Conference

Suggestions

Language modeling for Turkish continuous speech recognition
Şahin, Serkan; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2003)
This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabi...
Turkish large vocabulary continuous speech recognition by using limited audio corpus
Susman, Derya; Yazıcı, Adnan; Köprü, Selçuk; Department of Computer Engineering (2012)
Speech recognition in Turkish Language is a challenging problem in several perspectives. Most of the challenges are related to the morphological structure of the language. Since Turkish is an agglutinative language, it is possible to generate many words from a single stem by using suffixes. This characteristic of the language increases the out-of-vocabulary (OOV) words, which degrade the performance of a speech recognizer dramatically. Also, Turkish language allows words to be ordered in a free manner, whic...
On lexicon creation for turkish LVCSR
Kadri, Hacıoğlu; Bryan, Pellom; Çiloğlu, Tolga; Öztürk, Özlem; Mikko, Kurimo; Mathias, Creutz (null; 2003-09-14)
In this paper, we address the lexicon design problem in Turkish large vocabulary speech recognition. Although we focus only on Turkish, the methods described here are general enough that they can be considered for other agglutinative languages like Finnish, Korean etc. In an agglutinative language, several words can be created from a single root word using a rich collection of morphological rules. So, a virtually infinite size lexicon is required to cover the language if words are used as the basic units. T...
Head finalization and morphological analysis in factored phrase-based statistical machine translation from English to Turkish
İmren, Haydar; Çakıcı, Ruket; Department of Computer Engineering (2015)
Machine Translation is a field of study which deals with translating text from one natural language to another automatically. Statistical Machine Translation generates the translations using statistical methods and bilingual text corpora. In this study, an approach for translating from English to Turkish is introduced. Turkish is an agglutinative language with a free constituent order, whereas English is not agglutinative and the constituent order is strict. Besides these differences, there is a lack of par...
Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems
Tatlıcıoğlu, Mehmet; Yazıcı, Adnan; Department of Computer Engineering (2013)
Machine translation is the process of translating texts from a natural language to another by computers based on linguistic motivations, statistical approaches, or the combination of them. In this study, the frequency-driven late fusion-based word decomposition approach is introduced to improve the translation quality of the phrase-based statistical machine translation system from Turkish to English. This late fusion-based approach is compared with the standalone statistical and rule-based word decompositio...
Citation Formats
T. Çiloğlu and S. Sahin, “Language modelling for Turkish as an agglutinative language,” presented at the IEEE 12th Signal Processing and Communications Applications Conference, Kusadasi, TURKEY, 2004, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/39819.