Nonlinear interactive source-filter model for voiced speech

Download
2012
Koç, Turgay
The linear source-filter model (LSFM) has been used as a primary model for speech processing since 1960 when G. Fant presented acoustic speech production theory. It assumes that the source of voiced speech sounds, glottal flow, is independent of the filter, vocal tract. However, acoustic simulations based on the physical speech production models show that, especially when the fundamental frequency (F0) of source harmonics approaches to the first formant frequency (F1) of vocal tract filter, the filter has significant effects on the source due to the nonlinear coupling between them. In this thesis, as an alternative to linear source-filter model, nonlinear interactive source-filter models are proposed for voiced speech. This thesis has two parts, in the first part, a framework for the coupling of the source and the filter is presented. Then, two interactive system models are proposed assuming that glottal flow is a quasi-steady Bernoulli flow and acoustics in vocal tract is linear. In these models, instead of glottal flow, glottal area is used as a source for voiced speech. In the proposed interactive models, the relation between the glottal flow, glottal area and vocal tract is determined by the quasi-steady Bernoulli flow equation. It is theoretically shown that linear source-filter model is an approximation of the nonlinear models. Estimation of ISFM’s parameters from only speech signal is a nonlinear blind deconvolution problem. The problem is solved by a robust method developed based on the acoustical interpretation of the systems. Experimental results show that ISFMs produce source-filter coupling effects seen in the physical simulations and the parameter estimation method produce always stable and better performing models than LSFM model. In addition, a framework for the incorporation of the source-filter interaction into classical source-filter model is presented. The Rosenberg source model is extended to an interactive source for voiced speech and its performance is evaluated on a large speech database. The results of the experiments conducted on vowels in the database show that the interactive Rosenberg model is always better than its noninteractive version. In the second part of the thesis, LSFM and ISFMs are compared by using not only the speech signal but also HSV (High Speed Endocopic Video) of vocal folds in a system identification approach. In this case, HSV and speech are used as a reference input-output data for the analysis and comparison of the models. First, a new robust HSV processing algorithm is developed and applied on HSV images to extract the glottal area. Then, system parameters are estimated by using a modified version of the method proposed in the first part. The experimental results show that speech signal can contain some harmonics of the fundamental frequency of the glottal area other than those contained in the glottal area signal. Proposed nonlinear interactive source-filter models can generate harmonics components in speech and produce more realistic speech sounds than LSFM.

Suggestions

Wireless speech recognition using fixed point mixed excitation linear prediction (MELP) vocoder
Acar, D; Karci, MH; Ilk, HG; Demirekler, Mübeccel (2002-07-19)
A bit stream based front-end for wireless speech recognition system that operates on fixed point mixed excitation linear prediction (MELP) vocoder is presented in this paper. Speaker dependent, isolated word recognition accuracies obtained from conventional and bit stream based front-end systems are obtained and their statistical significance is discussed. Feature parameters are extracted from original (wireline) and decoded speech (conventional) and from the quantized spectral information (bit stream) of t...
Nonlinear interactive source-filter models for speech
KOÇ, Turgay; Çiloğlu, Tolga (2016-03-01)
The linear source-filter model of speech production assumes that the source of the speech sounds is independent of the filter. However, acoustic simulations based on the physical speech production models show that when the fundamental frequency of the source harmonics approaches the first formant of the vocal tract filter, the filter has significant effects on the source due to the nonlinear coupling between them. In this study, two interactive system models are proposed under the quasi steady Bernoulli flo...
Two channel adaptive speech enhancement
Zaim, Erman; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2014)
In this thesis, speech enhancement problem is studied and a speech enhancement system is implemented on TMS320C5505 fixed point DSP. Speech degradation due to the signal leakage into the reference microphone and uncorrelated signals between microphones are studied. Limitations of fixed point implementations are examined. Theoretical complexities of weight adaptation algorithms are examined. Moreover, differences between theoretical and practical complexities of weight adaptation algorithms due to the select...
Comparison of Linear and Nonlinear Modal Reduction Approaches
Ferhatoğlu, Erhan; Dreher, Tobias; Ciğeroğlu, Ender; Krack, Malte; Özgüven, Hasan Nevzat (null; 2019-01-31)
Periodic vibration response of nonlinear mechanical systems can be efficiently computed using Harmonic Balance Method. However, computational burden may still be considerable and impede extensive parametric studies needed for, e.g., design optimization and prediction of vibration response especially when the degree of freedom is very large. In this work, the methods which had been previously developed by the authors for further model order reduction to one or a few coordinates are compared. The focus is pla...
Adaptive output feedback control with reduced sensitivity to sensor noise
Kutay, Ali Türker; Hovakimyan, N (2003-01-01)
We address adaptive output feedback control of uncertain nonlinear systems with noisy output measurements, in which both the dynamics and the dimension of the regulated system may be unknown, and only the relative degree of the regulated output is assumed to be known. Given a smooth reference trajectory, the problem is to design a controller that forces the system measurement to track it with bounded errors. A recently developed method proposes the use of a linear error observer that estimates the tracking ...
Citation Formats
T. Koç, “Nonlinear interactive source-filter model for voiced speech,” Ph.D. - Doctoral Program, Middle East Technical University, 2012.