Bimodal automatic speech segmentation and boundary refinement techniques

Download
2010
Akdemir, Eren
Automatic segmentation of speech is compulsory for building large speech databases to be used in speech processing applications. This study proposes a bimodal automatic speech segmentation system that uses either articulator motion information (AMI) or visual information obtained by a camera in collaboration with auditory information. The presence of visual modality is shown to be very beneficial in speech recognition applications, improving the performance and noise robustness of those systems. In this dissertation a significant increase in the performance of the automatic speech segmentation system is achieved by using a bimodal approach. Automatic speech segmentation systems have a tradeoff between precision and resulting number of gross errors. Boundary refinement techniques are used in order to increase precision of these systems without decreasing the system performance. Two novel boundary refinement techniques are proposed in this thesis; a hidden Markov model (HMM) based fine tuning system and an inverse filtering based fine tuning system. The segment boundaries obtained by the bimodal speech segmentation system are improved further by using these techniques. To fulfill these goals, a complete two-stage automatic speech segmentation system is produced and tested in two different databases. A phonetically rich Turkish audiovisual speech database, that contains acoustic data and camera recordings of 1600 Turkish sentences uttered by a male speaker, is build from scratch in order to be used in the experiments. The visual features of the recordings are extracted and manual phonetic alignment of the database is done to be used as a ground truth for the performance tests of the automatic speech segmentation systems.

Suggestions

Cluster based user scheduling schemes to exploit multiuser diversity in wireless broadcast channels
Soydan, Yusuf; Candan, Çağatay; Department of Electrical and Electronics Engineering (2008)
Diversity methods are used to improve the reliability of the communication between transmitter and receiver. These methods use redundancy to reduce the errors in the communication link. Apart from the conventional diversity methods, multiuser diversity has an aim of maximizing the sum capacity of a multi-user system. To benefit from multiuser diversity, the opportunistic scheduling method grants the channel access to the user which has the best channel quality among all users. Therefore, the cumulative sum ...
Dynamic system modeling and state estimation for speech signal
Özbek, İbrahim Yücel; Demirekler, Mübeccel; Department of Electrical and Electronics Engineering (2010)
This thesis presents an all-inclusive framework on how the current formant tracking and audio (and/or visual)-to-articulatory inversion algorithms can be improved. The possible improvements are summarized as follows: The first part of the thesis investigates the problem of the formant frequency estimation when the number of formants to be estimated fixed or variable respectively. The fixed number of formant tracking method is based on the assumption that the number of formant frequencies is fixed along the ...
HMM topology for boundary refinement in automatic speech segmentation
Akdemir, E.; Çiloğlu, Tolga (Institution of Engineering and Technology (IET), 2010-07-22)
A boundary refinement method using a new hidden Markov model (HMM) topology is proposed for automatic phonetic speech segmentation. The proposed method has the ability to work at high frame rates and the training and boundary refinement stages are easy and fast. The method is data driven and can be adapted to any speech segmentation problem provided that a training set is available. Given an initial segmentation obtained by forced alignment using an HMM based phone recogniser, 20% decrease in boundary error...
Modeling phoneme durations and fundamental frequency contours in Turkish speech
Öztürk, Özlem; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2005)
The term prosody refers to characteristics of speech such as intonation, timing, loudness, and other acoustical properties imposed by physical, intentional and emotional state of the speaker. Phone durations and fundamental frequency contours are considered as two of the most prominent aspects of prosody. Modeling phone durations and fundamental frequency contours in Turkish speech are studied in this thesis. Various methods exist for building prosody models. State-of-the-art is dominated by corpus-based me...
HMIC miniaturization techniques and application on an FMCW range sensor transceiver
Korkmaz, Hakan; Demir, Şimşek; Department of Electrical and Electronics Engineering (2010)
This thesis includes the study of hybrid microwave integrated circuits (HMIC), miniaturization techniques applied on HMICs and its application on a frequency modulated continuous wave (FMCW) range sensor transceiver. In the scope of study, hybrid and monolithic microwave integrated circuits (HMIC and MMIC) are introduced, advantages and disadvantages of these two types are discussed. Large size of HMICs is the main disadvantage especially for military and civil applications requiring miniature volumes. This...
Citation Formats
E. Akdemir, “Bimodal automatic speech segmentation and boundary refinement techniques,” Ph.D. - Doctoral Program, Middle East Technical University, 2010.