Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (GMM) With Audio-Visual Information Fusion and Dynamic Kalman Smoothing
Date
2011-07-01
Author
ÖZBEK, İbrahim Yücel
Hasegawa-Johnson, Mark
Demirekler, Mübeccel
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
228
views
0
downloads
Cite This
This paper presents a detailed framework for Gaussian mixture model (GMM)-based articulatory inversion equipped with special postprocessing smoothers, and with the capability to perform audio-visual information fusion. The effects of different acoustic features on the GMM inversion performance are investigated and it is shown that the integration of various types of acoustic (and visual) features improves the performance of the articulatory inversion process. Dynamic Kalman smoothers are proposed to adapt the cutoff frequency of the smoother to data and noise characteristics; Kalman smoothers also enable the incorporation of auxiliary information such as phonetic transcriptions to improve articulatory estimation. Two types of dynamic Kalman smoothers are introduced: global Kalman (GK) and phoneme-based Kalman (PBK). The same dynamic model is used for all phonemes in the GK smoother; it is shown that GK improves the performance of articulatory inversion better than the conventional low-pass (LP) smoother. However, the PBK smoother, which uses one dynamic model for each phoneme, gives significantly better results than the GK smoother. Different methodologies to fuse the audio and visual information are examined. A novel modified late fusion algorithm, designed to consider the observability degree of the articulators, is shown to give better results than either the early or the late fusion methods. Extensive experimental studies are conducted with the MOCHA database to illustrate the performance gains obtained by the proposed algorithms. The average RMS error and correlation coefficient between the true (measured) and the estimated articulatory trajectories are 1.227 mm and 0.868 using audiovisual information fusion and GK smoothing, and 1.199 mm and 0.876 using audiovisual information fusion together with PBK smoothing based on a phonetic transcription of the utterance.
Subject Keywords
Acoustics and Ultrasonics
,
Electrical and Electronic Engineering
URI
https://hdl.handle.net/11511/58050
Journal
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
DOI
https://doi.org/10.1109/tasl.2010.2087751
Collections
Graduate School of Natural and Applied Sciences, Article
Suggestions
OpenMETU
Core
Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models
Özkan, Emre; Demirekler, Muebeccel (Institute of Electrical and Electronics Engineers (IEEE), 2009-11-01)
In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the sp...
Prediction of ducted diaphragm noise using a stochastic approach with adapted temporal filters
Karban, Ugur; Schram, Christophe; Sovardi, Carlo; Polifke, Wolfgang (SAGE Publications, 2019-01-01)
The noise production by ducted single- and double-diaphragm configurations is simulated using a stochastic noise generation and radiation numerical method. The importance of modeling correctly the anisotropy and temporal de-correlation is discussed, based on numerical results obtained by large eddy simulation. A new temporal filter is proposed, designed to provide the targeted spectral decay of energy in an Eulerian reference frame. An anisotropy correction is implemented using a non-linear model. The acous...
On Improving Dynamic State Space Approaches to Articulatory Inversion With MAP-Based Parameter Estimation
Özbek Arslan, Işıl; Hasegawa-Johnson, Mark; Demirekler, Mübeccel (Institute of Electrical and Electronics Engineers (IEEE), 2012-01-01)
This paper presents a complete framework for articulatory inversion based on jump Markov linear systems (JMLS). In the model, the acoustic measurements and the position of each articulator are considered as observable measurement and continuous-valued hidden state of the system, respectively, and discrete regimes of the system are represented by the use of a discrete-valued hidden modal state. Articulatory inversion based on JMLS involves learning the model parameter set of the system and making inference a...
Ultrathick and high-aspect-ratio nickel microgyroscope using EFAB multilayer additive electroforming
Alper, Said Emre; Ocak, Ilker Ender; Akın, Tayfun (Institute of Electrical and Electronics Engineers (IEEE), 2007-10-01)
This paper presents a new approach for the development of a microgyroscope that has a 240-/mu m-thick multilayer electroformed-nickel structural mass and a lateral aspect ratio greater than 100. The gyroscope is fabricated using commercial multilayer additive electroforming process EFAB of Microfabrica, Inc., which allows defining the thickness of different structural regions, such as suspensions, proof mass, and capacitive electrodes, unlike many classical surface-micromachining technologies that require a...
Bilateral CMUT Cells and Arrays: Equivalent Circuits, Diffraction Constants, and Substrate Impedance
KÖYMEN, Hayrettin; ATALAR, ABDULLAH; Tasdelen, A. Sinan (Institute of Electrical and Electronics Engineers (IEEE), 2017-02-01)
We introduce the large-signal and small-signal equivalent circuit models for a capacitive micromachined ultrasonic transducer (CMUT) cell, which has radiating plates on both sides. We present the diffraction coefficient of baffled and unbaffled CMUT cells. We show that the substrate can be modeled as a very thick radiating plate on one side, which can be readily incorporated in the introduced model. In the limiting case, the reactance of this backing impedance is entirely compliant for substrate materials w...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
İ. Y. ÖZBEK, M. Hasegawa-Johnson, and M. Demirekler, “Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (GMM) With Audio-Visual Information Fusion and Dynamic Kalman Smoothing,”
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
, pp. 1180–1195, 2011, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/58050.