Dynamic system modeling and state estimation for speech signal

Download
2010
Özbek, İbrahim Yücel
This thesis presents an all-inclusive framework on how the current formant tracking and audio (and/or visual)-to-articulatory inversion algorithms can be improved. The possible improvements are summarized as follows: The first part of the thesis investigates the problem of the formant frequency estimation when the number of formants to be estimated fixed or variable respectively. The fixed number of formant tracking method is based on the assumption that the number of formant frequencies is fixed along the speech utterance. The proposed algorithm is based on the combination of a dynamic programming algorithm and Kalman filtering/smoothing. In this method, the speech signal is divided into voiced and unvoiced segments, and the formant candidates are associated via dynamic programming algorithm for each voiced and unvoiced part separately. Individual adaptive Kalman filtering/smoothing is used to perform the formant frequency estimation. The performance of the proposed algorithm is compared with some algorithms given in the literature. The variable number of formant tracking method considers those formant frequencies which are visible in the spectrogram. Therefore, the number of formant frequencies is not fixed and they can change along the speech waveform. In that case, it is also necessary to estimate the number of formants to track. For this purpose, the proposed algorithm uses extra logic (formant track start/end decision unit). The measurement update of each individual formant trajectories is handled via Kalman filters. The performance of the proposed algorithm is illustrated by some examples The second part of this thesis is concerned with improving audiovisual to articulatory inversion performance. The related studies can be examined in two parts; Gaussian mixture model (GMM) regression based inversion and Jump Markov Linear System (JMLS) based inversion. GMM regression based inversion method involves modeling audio (and /or visual) and articulatory data as a joint Gaussian mixture model. The conditional expectation of this distribution gives the desired articulatory estimate. In this method, we examine the usefulness of the combination of various acoustic features and effectiveness of various types of fusion techniques in combination with audiovisual features. Also, we propose dynamic smoothing methods to smooth articulatory trajectories. The performance of the proposed algorithm is illustrated and compared with conventional algorithms. JMLS inversion involves tying the acoustic (and/or visual) spaces and articulatory space via multiple state space representations. In this way, the articulatory inversion problem is converted into the state estimation problem where the audiovisual data are considered as measurements and articulatory positions are state variables. The proposed inversion method first learns the parameter set of the state space model via an expectation maximization (EM) based algorithm and the state estimation is handled via interactive multiple model (IMM) filter/smoother.

Suggestions

Asynchronous design of systolic array architectures in cmos
İsmailoğlu, Ayşe Neslin; Aşkar, Murat; Department of Electrical and Electronics Engineering (2008)
In this study, delay-insensitive asynchronous circuit design style has been adopted to systolic array architectures to exploit the benefits of both techniques for improved throughput. A delay-insensitivity verification analysis method employing symbolic delays is proposed for bit-level pipelined asynchronous circuits. The proposed verification method allows datadependent early output evaluation to co-exist with robust delay-insensitive circuit behavior in pipelined architectures such as systolic arrays. Reg...
Parameter extraction and image enhancement for catadioptric omnidirectional cameras
Baştanlar, Yalın; Çetin, Yasemin; Department of Information Systems (2005)
In this thesis, catadioptric omnidirectional imaging systems are analyzed in detail. Omnidirectional image (ODI) formation characteristics of different camera-mirror configurations are examined and geometrical relations for panoramic and perspective image generation with common mirror types are summarized. A method is developed to determine the unknown parameters of a hyperboloidal-mirrored system using the world coordinates of a set of points and their corresponding image points on the ODI. A linear relati...
A comparative evaluation of conventional and particle filter based radar target tracking
Yıldırım, Berkin; Demirekler, Mübeccel; Department of Electrical and Electronics Engineering (2007)
In this thesis the radar target tracking problem in Bayesian estimation framework is studied. Traditionally, linear or linearized models, where the uncertainty in the system and measurement models is typically represented by Gaussian densities, are used in this area. Therefore, classical sub-optimal Bayesian methods based on linearized Kalman filters can be used. The sequential Monte Carlo methods, i.e. particle filters, make it possible to utilize the inherent non-linear state relations and non-Gaussian no...
Computer simulation and implementation of a visual 3-d eye gaze tracker for autostreoscopic displays
İnce, Kutalmış Gökalp; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2009)
In this thesis, a visual 3-D eye gaze tracker is designed and implemented to tested via computer simulations and on an experimental setup. Proposed tracker is designed to examine human perception on autostereoscopic displays when the viewer is 3m away from such displays. Two different methods are proposed for calibrating personal parameters and gaze estimation, namely line of gaze (LoG) and line of sight (LoS) solutions. 2-D and 3-D estimation performances of the proposed system are observed both using comp...
Neural network method for direction of arrival estimation with uniform cylindrical microstrip patch array
Caylar, S.; Dural, G.; Leblebicioğlu, Mehmet Kemal (Institution of Engineering and Technology (IET), 2010-02-01)
In this study, a new neural network algorithm is proposed for real-time multiple source tracking problem with cylindrical patch antenna array based on a previously reported Modified Neural Multiple Source Tracking (MN-MUST) algorithm. The proposed algorithm, namely cylindrical microstrip patch array modified neural multiple source tracking (CMN-MUST) algorithm implements MN-MUST algorithm on a cylindrical microstrip patch array structure. CMN-MUST algorithm uses the advantage of directive pattern of microst...
Citation Formats
İ. Y. Özbek, “Dynamic system modeling and state estimation for speech signal,” Ph.D. - Doctoral Program, Middle East Technical University, 2010.