Mitra, Vikramjit
Oezbek, I. Yuecel
Nam, Hosung
Zhou, Xinhui
Espy-Wilson, Carol Y.
In this paper we present a technique for obtaining Vocal Tract (VT) time functions from the acoustic speech signal. Knowledge-based Acoustic Parameters (APs) are extracted from the speech signal and a pertinent subset is used to obtain the mapping between them and the VT time functions. Eight different vocal tract constriction variables consisting of live constriction degree variables,. lip aperture (LA), tongue body (TBCD), tongue tip (TTCD), velum (VEL), and glottis (GLO); and three constriction location variables, lip protrusion (LP), tongue tip (TTCL), tongue body (TBCL) were considered in this study. The TAsk Dynamics Application model (TADA [1]) is used to create a synthetic speech dataset along with its corresponding VT time functions. We explore Support Vector Regression (SVR) followed by Kalman smoothing to achieve mapping between the APs and the VT time functions.


Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models
Özkan, Emre; Demirekler, Muebeccel (Institute of Electrical and Electronics Engineers (IEEE), 2009-11-01)
In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the sp...
Spectral modification for context-free voice conversion using MELP speech coding framework
Salor, O; Demirekler, Mübeccel (2004-10-22)
In this work, we have focused on spectral modification of speech for voice con version from one speaker to another. Speech conversion aims to modify the speech of one speaker such that the modified speech sounds as if spoken by another speaker. MELP (Mixed Excitation Linear Prediction) speech coding algorithm has been used as speech analysis and synthesis framework. Using a 230-sentence triphone balanced database of the two speakers, a mapping between the 4-stage vector quantization indexes for line spectra...
Tracking of Visible Vocal Tract Resonances (VVTR) Based on Kalman Filtering
Özbek Arslan, Işıl; Demirekler, Mübeccel (2006-01-01)
This paper analyzes vocal tract resonance (VTR) frequency trajectories and their relationship to formants from a new point of view. Considering abrupt/continuous changes in the physical geometry of vocal tract, VTR may change in number, suddenly change their positions or may leak to some regions where they usually do not exist. We define the visible VTR (VVTR) as VTR that can be seen from the spectrogram. So we propose an algorithm, based on Kalman filtering, that can handle all these changes in VVTR. The s...
Modeling of plosive to vowel transitions
Beköz, Alican; Demirekler, Mübeccel; Department of Electrical and Electronics Engineering (2007)
This thesis presents a study concerning stop consonant to vowel transitions which are modeled making use of acoustic tube model. Characteristics of the stop consonant to vowel transitions are tried to be obtained first. Therefore several transitions including fricative to vowel transitions are examined based on spectral and time related properties. In addition to these studies, x-ray snapshots, lip videos and also experiments including subjects are used to intensify the characterization, from the production...
Combining Structural Analysis and Computer Vision Techniques for Automatic Speech Summarization
Sert, Mustafa ; Baykal, Buyurman; Yazıcı, Adnan (2008-12-17)
Similar to verse and chorus sections that appear as repetitive structures in musical audio, key-concept (or topic) of some speech recordings (e.g., presentations, lectures, etc.) may also repeat itself over the time. Hence, accurate detection of these repetitions may be helpful to the success of automatic speech summarization. Based on this motivation, we consider the applicability of music structural analysis methods to speech summary generation. Our method transforms a 1 - D time-domain speech signal to a...
