Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models

Date

2009-11-01

Author

Özkan, Emre
Demirekler, Muebeccel

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

223
views

0
downloads

In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the spectrum during the speech utterance. The analysis is based on a new state space representation of concatenated tube model. We show that the number of formants which appear in the spectrum is directly related to the location of the constriction of the vocal tract (i.e., the location of the excitation). Moreover, the disappearance of the formants in the spectrum is explained by "uncontrollable modes" of the state space model. Under the assumption of existence of varying number of formants in the spectrum, we propose the use of a DPM model based multi-target tracking algorithm for tracking unknown number of formants. The tracking algorithm defines a hierarchical Bayesian model for the unknown formant states and the inference is done via Rao-Blackwellized particle filter.

Subject Keywords

Acoustics and Ultrasonics, Electrical and Electronic Engineering

URI

https://hdl.handle.net/11511/42880

Journal

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

DOI

https://doi.org/10.1109/tasl.2009.2022198

Collections

Department of Electrical and Electronics Engineering, Article

Suggestions

OpenMETU
Core

Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (GMM) With Audio-Visual Information Fusion and Dynamic Kalman Smoothing ÖZBEK, İbrahim Yücel; Hasegawa-Johnson, Mark; Demirekler, Mübeccel (Institute of Electrical and Electronics Engineers (IEEE), 2011-07-01) This paper presents a detailed framework for Gaussian mixture model (GMM)-based articulatory inversion equipped with special postprocessing smoothers, and with the capability to perform audio-visual information fusion. The effects of different acoustic features on the GMM inversion performance are investigated and it is shown that the integration of various types of acoustic (and visual) features improves the performance of the articulatory inversion process. Dynamic Kalman smoothers are proposed to adapt t...
Theoretical investigation of effects of flow oscillations on ultrasound Doppler velocity measurements Koseli, Volkan; Uludağ, Yusuf (Elsevier BV, 2012-02-01) Effects of flow oscillations on spectrum of Ultrasound Doppler Velocimetry (UDV) signals were investigated theoretically and numerically. A laminar pipe flow with a superimposed oscillating component was considered. Negative impact of oscillation on the ultrasound signal hence on the flow images was observed in the form of spreading of spectral ultrasound signal energy around mean component, leading to image artifacts. Both analytical and numerical results revealed the strong effect of a group of parameters...
Prediction of ducted diaphragm noise using a stochastic approach with adapted temporal filters Karban, Ugur; Schram, Christophe; Sovardi, Carlo; Polifke, Wolfgang (SAGE Publications, 2019-01-01) The noise production by ducted single- and double-diaphragm configurations is simulated using a stochastic noise generation and radiation numerical method. The importance of modeling correctly the anisotropy and temporal de-correlation is discussed, based on numerical results obtained by large eddy simulation. A new temporal filter is proposed, designed to provide the targeted spectral decay of energy in an Eulerian reference frame. An anisotropy correction is implemented using a non-linear model. The acous...
On Improving Dynamic State Space Approaches to Articulatory Inversion With MAP-Based Parameter Estimation Özbek Arslan, Işıl; Hasegawa-Johnson, Mark; Demirekler, Mübeccel (Institute of Electrical and Electronics Engineers (IEEE), 2012-01-01) This paper presents a complete framework for articulatory inversion based on jump Markov linear systems (JMLS). In the model, the acoustic measurements and the position of each articulator are considered as observable measurement and continuous-valued hidden state of the system, respectively, and discrete regimes of the system are represented by the use of a discrete-valued hidden modal state. Articulatory inversion based on JMLS involves learning the model parameter set of the system and making inference a...
Dynamic system modeling and state estimation for speech signal Özbek, İbrahim Yücel; Demirekler, Mübeccel; Department of Electrical and Electronics Engineering (2010) This thesis presents an all-inclusive framework on how the current formant tracking and audio (and/or visual)-to-articulatory inversion algorithms can be improved. The possible improvements are summarized as follows: The first part of the thesis investigates the problem of the formant frequency estimation when the number of formants to be estimated fixed or variable respectively. The fixed number of formant tracking method is based on the assumption that the number of formant frequencies is fixed along the ...

Citation Formats

E. Özkan and M. Demirekler, “Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models,” IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, pp. 1518–1532, 2009, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/42880.