Large vocabulary continuous speech recognition for Turkish Using HTK

Download

index.pdf

Date

2003

Author

Çömez, Murat Ali

Metadata

Show full item record

Item Usage Stats

238
views

0
downloads

This study aims to build a new language model that can be used in a Turkish large vocabulary continuous speech recognition system. Turkish is a very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable. From only one simple stem, thousands of new word forms can be generated using inflectional or derivational suffixes. In this thesis, words are parsed into their stems and endings. One ending includes the suffixes attached to the associated root. Then the search network based on bigrams is constructed. Bigrams are obtained either using stem and endings, or using only stems. The language model proposed is based on bigrams obtained using only stems. All work is done in HTK (Hidden Markov Model Toolkit) environment, except parsing and network transforming. Besides of offering a new language model for Turkish, this study involves a comprehensive work about speech recognition inspecting into concepts in the state of the art speech recognition systems. To acquire good command of these concepts and processes in speech recognition isolated word, connected word and continuous speech recognition tasks are performed. The experimental results associated with these tasks are also given.

Subject Keywords

Parsing (Computer grammar)

URI

http://etd.lib.metu.edu.tr/upload/1205491/index.pdf
https://hdl.handle.net/11511/13617

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Language modeling for Turkish continuous speech recognition Şahin, Serkan; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2003) This study aims to build a new language model for Turkish continuous speech recognition. Turkish is very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable from only one simple stem, thousands of new words can be generated using inflectional and derivational suffixes. In this work, word are parsed into their stem and endings. First of all, we consider endings as words and we obtained bigram probabi...
A study on language modeling for Turkish large vocabulary continuous speech recognition Bayer, Ali Orkan; Turhan Yöndem, Meltem; Department of Computer Engineering (2005) This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based mo...
Morphological processing of inflected and derived words in L1 Turkish and L2 English Şafak, Duygu Fatma; Kırkıcı, Bilal; Department of English Language Teaching (2015) The present study aims at examining how inflected and derived words are processed during the early stages of visual word recognition in a native language (L1) and in a second language (L2). A second aim of the study is to find out whether or not the semantic and surface-form properties of morphologically complex words affect early word recognition processes. Two masked priming experiments were conducted to investigate morphological processing in L1 Turkish and in L2 English. In the first experiment, 40 L1 s...
Processing of conditional constructions in Turkish l2 speakers of English Evcen, Ebru; Özge, Duygu; Department of English Language Teaching (2019) This thesis aims to examine whether Turkish L2 learners of English process conditional constructions in an incremental and/or predictive manner. An offline grammaticality judgment (GJT) task was devised to test L2 learners’ sensitivity to grammatical violations and an online self-paced reading (SPR) task was designed to find out whether processing patterns of L2 learners would match existing L2 processing accounts. We manipulated the Connector Type (unless, unless…not, if…not) and Context Type (congruent, i...
The Second language processing of nominal compounds: a masked priming study Çelikkol Berk, Nurten; Kırkıcı, Bilal; Department of English Language Teaching (2018) The primary purpose of the present study was to understand the workings of the cognitive mechanisms underlying L2 morphological processing, and more particularly, to explore how noun-noun compounds in L2 English are processed by native speakers of Turkish in the earliest stages of word recognition. Furthermore, the study investigated the role of constituent morphemes in the processing of compound words and examined whether or not a compound word primes its first and second constituents equally. The final pu...

Citation Formats

M. A. Çömez, “Large vocabulary continuous speech recognition for Turkish Using HTK,” M.S. - Master of Science, Middle East Technical University, 2003.