On Improving Dynamic State Space Approaches to Articulatory Inversion With MAP-Based Parameter Estimation

2012-01-01
Özbek Arslan, Işıl
Hasegawa-Johnson, Mark
Demirekler, Mübeccel
This paper presents a complete framework for articulatory inversion based on jump Markov linear systems (JMLS). In the model, the acoustic measurements and the position of each articulator are considered as observable measurement and continuous-valued hidden state of the system, respectively, and discrete regimes of the system are represented by the use of a discrete-valued hidden modal state. Articulatory inversion based on JMLS involves learning the model parameter set of the system and making inference about the state (position of each articulator) of the system using acoustic measurements. Iterative learning algorithms based on maximum-likelihood (ML) and maximum a posteriori (MAP) criteria are proposed to learn the model parameter set of the JMLS. It is shown that the learning procedure of the JMLS is a generalized version of hidden Markov model (HMM) training when both acoustic and articulatory data are given. In this paper, it is shown that the MAP-based learning algorithm improves modeling performance of the system and gives significantly better results compared to ML. The inference stage of the proposed algorithm is based on an interacting multiple models (IMM) approach, and done online (filtering), and/or offline (smoothing). Formulas are provided for IMM-based JMLS smoothing. It is shown that smoothing significantly improves the performance of articulatory inversion compared to filtering. Several experiments are conducted with the MOCHA database to show the performance of the proposed method. Comparison of the performance of the proposed method with the ones given in the literature shows that the proposed method improves the performance of state space approaches, making state space approaches comparable to the best published results.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Suggestions

Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (GMM) With Audio-Visual Information Fusion and Dynamic Kalman Smoothing
ÖZBEK, İbrahim Yücel; Hasegawa-Johnson, Mark; Demirekler, Mübeccel (Institute of Electrical and Electronics Engineers (IEEE), 2011-07-01)
This paper presents a detailed framework for Gaussian mixture model (GMM)-based articulatory inversion equipped with special postprocessing smoothers, and with the capability to perform audio-visual information fusion. The effects of different acoustic features on the GMM inversion performance are investigated and it is shown that the integration of various types of acoustic (and visual) features improves the performance of the articulatory inversion process. Dynamic Kalman smoothers are proposed to adapt t...
A NEW METHOD FOR HARMONIC RESPONSE OF NONPROPORTIONALLY DAMPED STRUCTURES USING UNDAMPED MODAL DATA
Özgüven, Hasan Nevzat (Elsevier BV, 1987-09-08)
A method of calculating the receptances of a non-proportionally damped structure from the undamped modal data and the damping matrix of the system is presented. The method developed is an exact method. It gives exact results when exact undamped receptances are employed in the computation. Inaccuracies are due to the truncations made in the calculation of undamped receptances. Numerical examples, demonstrating the accuracy and speed of the method when truncated receptance series are used are also presented. ...
Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models
Özkan, Emre; Demirekler, Muebeccel (Institute of Electrical and Electronics Engineers (IEEE), 2009-11-01)
In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the sp...
Pre-processing inputs for optimally-configured time-delay neural networks
Taşkaya Temizel, Tuğba; Ahmad, K (Institution of Engineering and Technology (IET), 2005-02-01)
A procedure for pre-processing non-stationary time series is proposed for modelling with a time-delay neural network (TDNN). The procedure stabilises the mean of the series and uses a fast Fourier transform to determine the TDNN input size. Results of applying this procedure on five well-known data sets are compared with existing hybrid neural network techniques, demonstrating improved prediction performance.
Multipath Characteristics of Frequency Diverse Arrays Over a Ground Plane
Cetintepe, Cagri; Demir, Şimşek (Institute of Electrical and Electronics Engineers (IEEE), 2014-07-01)
This paper presents a theoretical framework for an analytical investigation of multipath characteristics of frequency diverse arrays (FDAs), a task which is attempted for the first time in the open literature. In particular, transmitted field expressions are formulated for an FDA over a perfectly conducting ground plane first in a general analytical form, and these expressions are later simplified under reasonable assumptions. Developed formulation is then applied to a uniform, linear, continuous-wave opera...
Citation Formats
I. Özbek Arslan, M. Hasegawa-Johnson, and M. Demirekler, “On Improving Dynamic State Space Approaches to Articulatory Inversion With MAP-Based Parameter Estimation,” IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, pp. 67–81, 2012, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/50754.