On Developing New Text and Audio Corpora and Speech Recognition Tools for the Turkish Language

Salor Durna, Özgül
Pellom, Brian
Çiloğlu, Tolga
Hacıoğlu, Kadri
Demirekler, Mübeccel
This paper describes recent work towards development of new corpora and tools for Turkish speech research. This effort represents an on-going collaboration between the Center for Spoken Language Research (CSLR) at the University of Colorado and the Department of Electrical Engineering at the Middle East Technical University (METU). A new text corpus developed from Turkish newspapers’ text is described. In addition, a 193-speaker audio corpus and pronunciation lexicon for the Turkish language is developed. We then describe our initial work towards porting Sonic, the CSLR speech recognition system, to the Turkish language. Results are shown for phonetic alignment and phoneme recognition accuracy using the newly constructed corpus and speech tools. It is shown that 91.2% of the automatically labeled phoneme boundaries are placed within 20 msec of hand-labeled locations for the Turkish audio corpus. Finally, a phoneme recognition error rate of 29.3% is demonstrate
7th International Conference on Spoken Language Processing, ICSLP (2002


On lexicon creation for turkish LVCSR
Kadri, Hacıoğlu; Bryan, Pellom; Çiloğlu, Tolga; Öztürk, Özlem; Mikko, Kurimo; Mathias, Creutz (null; 2003-09-14)
In this paper, we address the lexicon design problem in Turkish large vocabulary speech recognition. Although we focus only on Turkish, the methods described here are general enough that they can be considered for other agglutinative languages like Finnish, Korean etc. In an agglutinative language, several words can be created from a single root word using a rich collection of morphological rules. So, a virtually infinite size lexicon is required to cover the language if words are used as the basic units. T...
Three essays on education in Turkey
Bircan, Fatma; Tansel, Aysıt; Department of Economics (2005)
This thesis analyzes the pecuniary aspects of education in Turkey. It consists of three essays. The first essay deals with the demand for education, focusing on private tutoring expenditures of households. The study investigates the determinants of private tutoring expenditures of households using a Tobit model as the estimation method. It is found that wealthier households with higher levels of parental education are more likely to participate in private tutoring. The second essay concerns the wage inequal...
The Discourse structure of Turkish
Demirşahin, Işın; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2015)
This thesis investigates the structure of immediate discourse in Turkish. The first and fore- most question is how discourse is built. Are there components of discourse that constitute a predicate-argument structure, or is discourse realized by underlying non-structural ties that are merely made explicit by these components? If there is structure in discourse, what is the nature of this structure, and what is its complexity? For this purpose, we analyze the relations annotated in the Turkish Discourse Bank,...
Analysis of Turkey's national innovation system
Çetinkaya, Umut Yılmaz; Çakmur, Barış; Department of Political Science and Public Administration (2004)
This thesis analyses the National Innovation System of Turkey. In order to achieve this purpose, on the one hand, ءcatching-up̕, ءforging ahead̕, and ءfalling behind̕ processes of the countries and their relationships with economic growth, long wave theories, and valid techno-economic paradigm have been studied; while on the other hand, the historical evolution of the science, technology, and innovation systems, are investigated together with foresight studies, which are considered as their guide. In conclu...
An Analysis of geographic and sectoral diversification using city and sector indices of Turkey
Ayan, Hamdi; Soytaş, Uğur; Oran, Adil; Department of Business Administration (2015)
This thesis examines the potentials of geographic and sectoral diversification in Turkey. A Causality in variance test suggested by Hafner and Herwartz is conducted in a pairwise fashion among city indices and among sector indices, separately, to explore the existence of diversification potential. All data are 5-day week daily time series and sourced from Borsa Istanbul. City index data covers the period January 2, 2009 between November 24, 2014, whereas sectoral index data covers the period April 1, 2004 a...
Citation Formats
Ö. Salor Durna, B. Pellom, T. Çiloğlu, K. Hacıoğlu, and M. Demirekler, “On Developing New Text and Audio Corpora and Speech Recognition Tools for the Turkish Language,” Denver, CO USA, 2002, vol. 1, p. 349, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/83762.