Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Nonlinear interactive source-filter models for speech
Date
2016-03-01
Author
KOÇ, Turgay
Çiloğlu, Tolga
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
252
views
0
downloads
Cite This
The linear source-filter model of speech production assumes that the source of the speech sounds is independent of the filter. However, acoustic simulations based on the physical speech production models show that when the fundamental frequency of the source harmonics approaches the first formant of the vocal tract filter, the filter has significant effects on the source due to the nonlinear coupling between them. In this study, two interactive system models are proposed under the quasi steady Bernoulli flow and linear vocal tract assumptions. An algorithm is developed to estimate the model parameters. Glottal flow and the linear vocal tract parameters are found by conventional methods. Rosenberg model is used to synthesize the glottal waveform. A recursive optimization method is proposed to find the parameters of the interactive model. Finally, glottal flow produced by the nonlinear interactive system is computed. The experimental results show that the interactive system model produces fine details of glottal flow source accurately.
Subject Keywords
Speech production
,
Source-filter theory
,
Source-filter interaction
,
Speech modeling
URI
https://hdl.handle.net/11511/34371
Journal
COMPUTER SPEECH AND LANGUAGE
DOI
https://doi.org/10.1016/j.csl.2014.12.002
Collections
Department of Electrical and Electronics Engineering, Article
Suggestions
OpenMETU
Core
Nonlinear interactive source-filter model for voiced speech
Koç, Turgay; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2012)
The linear source-filter model (LSFM) has been used as a primary model for speech processing since 1960 when G. Fant presented acoustic speech production theory. It assumes that the source of voiced speech sounds, glottal flow, is independent of the filter, vocal tract. However, acoustic simulations based on the physical speech production models show that, especially when the fundamental frequency (F0) of source harmonics approaches to the first formant frequency (F1) of vocal tract filter, the filter has s...
Non-linear filtering based on observations from Gaussian processes
Gustafsson, Fredrik; Saha, Saikat; Orguner, Umut (2011-03-12)
We consider a class of non-linear filtering problems, where the observation model is given by a Gaussian process rather than the common non-linear function of the state and measurement noise. The new observation model can be considered as a generalization of the standard one with correlated measurement noise in both time and space. We propose a particle filter based approach with a measurement update step that requires a memory of past observations which can be truncated using a moving window to obtain a fi...
KNOWLEDGE-BASED SPEECH SYNTHESIS BY CONCATENATION OF PHONEME SAMPLES
OZUM, IY; Bulut, Mehmet Mete (1994-04-14)
In this work a speech synthesis system is implemented. The system uses concatenation of phoneme waveforms as the method of synthesis. These waveforms are generated by sampling the speech of a human speaker and then separating it into its phonemes. These phoneme samples are stored in the hard disk to be used in the synthesis. Then the text to be read is separated into its syllables and each syllable is synthesized by concatenating the phoneme samples. This method is facilitated by the structure of the Turkis...
Localization Uncertainty in Time-Amplitude Stereophonic Reproduction
De Sena, Enzo; Cvetkovic, Zoran; Hacıhabiboğlu, Hüseyin; Moonen, Marc; van Waterschoot, Toon (Institute of Electrical and Electronics Engineers (IEEE), 2020-01-01)
This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to free-field point-like sources. The comparison is carried out using a particula...
Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features
Dogan, Ebru; SERT, MUSTAFA; Yazicit, Adnan (2009-07-25)
This paper describes the development of a generated solution for classification and segmentation of broadcast news audio, A sound stream is segmented by classifying each sub-segment into silence, pure speech, music, environmental sound, speech over music, and speech over environmental sound classes in multiple steps. Support Vector Machines and Hidden Markov Models are employed for classification and these models are trained by using different sets of MPEG-7 features. A series of tests was conducted on hand...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
T. KOÇ and T. Çiloğlu, “Nonlinear interactive source-filter models for speech,”
COMPUTER SPEECH AND LANGUAGE
, pp. 365–394, 2016, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/34371.