A study on language modeling for Turkish large vocabulary continuous speech recognition

Download

index.pdf

Date

2005

Author

Bayer, Ali Orkan

Metadata

Show full item record

Item Usage Stats

363
views

271
downloads

This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based models, and stem-end-based models. Two pass recognition is performed using the Hidden Markov Modeling Toolkit (HTK) for testing the system first with the bigram models and then with the trigram models. At the end of the study, it is found that trigram models over stems and endings give better results, since their coverage of the vocabulary is better.

Subject Keywords

Electronic computers.

URI

http://etd.lib.metu.edu.tr/upload/2/12606612/index.pdf
https://hdl.handle.net/11511/15551

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Large vocabulary continuous speech recognition for Turkish Using HTK Çömez, Murat Ali; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2003) This study aims to build a new language model that can be used in a Turkish large vocabulary continuous speech recognition system. Turkish is a very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable. From only one simple stem, thousands of new word forms can be generated using inflectional or derivational suffixes. In this thesis, words are parsed into their stems and endings. One ending includes ...
A comparison of subspace based face recognition methods Gönder, Özkan; Halıcı, Uğur; Department of Electrical and Electronics Engineering (2004) Different approaches to the face recognition are studied in this thesis. These approaches are PCA (Eigenface), Kernel Eigenface and Fisher LDA. Principal component analysis extracts the most important information contained in the face to construct a computational model that best describes the face. In Eigenface approach, variation between the face images are described by using a set of characteristic face images in order to find out the eigenvectors (Eigenfaces) of the covariance matrix of the distribution,...
An Investigation of directive speech acts in L2 learners’ e-mails / Toraman, Mediha; Zeyrek Bozşahin, Deniz; Department of English Language Teaching (2014) This study aims to find out and analyze the directive speech acts used by Turkish speakers of English while making a request or suggestion in their e-mails to their instructors. Data were collected by asking students to write e-mails to their Turkish instructors regarding certain exercises they had done. A group of these students were also contacted again regarding the e-mails they had sent to find out the reasons for their use of certain patterns of speech acts. A questionnaire which includes samples from ...
The Corpus of Turkish Youth Language (COTY): The compilation and interactional dynamics of a spoken corpus Efeoğlu Özcan, Esranur; Işık Güler, Hale; English Language Teaching (2022-9-2) This study examines the previously unattained research area of contemporary spoken Turkish used in dyadic and multi-party interaction among young speakers of Turkish. For this purpose, a specialized corpus called the Corpus of Turkish Youth Language (CoTY) was compiled as a source of data and as a tool of analysis. Designed to offer a maximally representative sample of Turkish youth talk, the CoTY contains naturally occurring and spontaneous interactional data among young people between the ages of 14-18 fr...
Language modeling for Turkish continuous speech recognition Şahin, Serkan; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2003) This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabi...

Citation Formats

A. O. Bayer, “A study on language modeling for Turkish large vocabulary continuous speech recognition,” M.S. - Master of Science, Middle East Technical University, 2005.