A study on language modeling for Turkish large vocabulary continuous speech recognition

Download
2005
Bayer, Ali Orkan
This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based models, and stem-end-based models. Two pass recognition is performed using the Hidden Markov Modeling Toolkit (HTK) for testing the system first with the bigram models and then with the trigram models. At the end of the study, it is found that trigram models over stems and endings give better results, since their coverage of the vocabulary is better.

Suggestions

Large vocabulary continuous speech recognition for Turkish Using HTK
Çömez, Murat Ali; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2003)
This study aims to build a new language model that can be used in a Turkish large vocabulary continuous speech recognition system. Turkish is a very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable. From only one simple stem, thousands of new word forms can be generated using inflectional or derivational suffixes. In this thesis, words are parsed into their stems and endings. One ending includes ...
An Investigation of directive speech acts in L2 learners’ e-mails /
Toraman, Mediha; Zeyrek Bozşahin, Deniz; Department of English Language Teaching (2014)
This study aims to find out and analyze the directive speech acts used by Turkish speakers of English while making a request or suggestion in their e-mails to their instructors. Data were collected by asking students to write e-mails to their Turkish instructors regarding certain exercises they had done. A group of these students were also contacted again regarding the e-mails they had sent to find out the reasons for their use of certain patterns of speech acts. A questionnaire which includes samples from ...
A comparison of subspace based face recognition methods
Gönder, Özkan; Halıcı, Uğur; Department of Electrical and Electronics Engineering (2004)
Different approaches to the face recognition are studied in this thesis. These approaches are PCA (Eigenface), Kernel Eigenface and Fisher LDA. Principal component analysis extracts the most important information contained in the face to construct a computational model that best describes the face. In Eigenface approach, variation between the face images are described by using a set of characteristic face images in order to find out the eigenvectors (Eigenfaces) of the covariance matrix of the distribution,...
Language modeling for Turkish continuous speech recognition
Şahin, Serkan; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2003)
This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabi...
Answer localization system using discourse evaluation
Sualp, Merter; Yöndem (Turhan), Meltem; Department of Computer Engineering (2004)
The words in a language not only help us to construct the sentences but also contain some other features, which we usually underestimate. Each word relates itself to the remaining ones in some way. In our daily lives, we extensively use these relations in many areas, where question direction is also one of them. In this work, it is investigated whether the relations between the words can be useful for question direction and an approach for question direction is presented. Besides, a tool is devised in the w...
Citation Formats
A. O. Bayer, “A study on language modeling for Turkish large vocabulary continuous speech recognition,” M.S. - Master of Science, Middle East Technical University, 2005.