Two channel adaptive speech enhancement

Zaim, Erman
In this thesis, speech enhancement problem is studied and a speech enhancement system is implemented on TMS320C5505 fixed point DSP. Speech degradation due to the signal leakage into the reference microphone and uncorrelated signals between microphones are studied. Limitations of fixed point implementations are examined. Theoretical complexities of weight adaptation algorithms are examined. Moreover, differences between theoretical and practical complexities of weight adaptation algorithms due to the selected DSP hardware are studied. Effects of the acoustic characteristics of recording environment on the performance of adaptive algorithms are examined. Computer simulations are performed on SAD source separation and Widrow's speech enhancement systems based on LMS, sign LMS and NLMS adaptive weight algorithms under both artificial and natural noises in order to compare their performances and decide filter length and step size selections. Speech enhancement systems based on LMS, SE-LMS and NLMS algorithms are implemented real time on TMS320C5505 fixed point DSP. Performances of these systems are evaluated by performing subjective listening tests. It is shown that implemented speech enhancement system works consistently and it increases the intelligibility of the speech transmitted to other party under various types of real noises. .


Spectral modification for context-free voice conversion using MELP speech coding framework
Salor, O; Demirekler, Mübeccel (2004-10-22)
In this work, we have focused on spectral modification of speech for voice con version from one speaker to another. Speech conversion aims to modify the speech of one speaker such that the modified speech sounds as if spoken by another speaker. MELP (Mixed Excitation Linear Prediction) speech coding algorithm has been used as speech analysis and synthesis framework. Using a 230-sentence triphone balanced database of the two speakers, a mapping between the 4-stage vector quantization indexes for line spectra...
OZUM, IY; Bulut, Mehmet Mete (1994-04-14)
In this work a speech synthesis system is implemented. The system uses concatenation of phoneme waveforms as the method of synthesis. These waveforms are generated by sampling the speech of a human speaker and then separating it into its phonemes. These phoneme samples are stored in the hard disk to be used in the synthesis. Then the text to be read is separated into its syllables and each syllable is synthesized by concatenating the phoneme samples. This method is facilitated by the structure of the Turkis...
Nonlinear interactive source-filter model for voiced speech
Koç, Turgay; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2012)
The linear source-filter model (LSFM) has been used as a primary model for speech processing since 1960 when G. Fant presented acoustic speech production theory. It assumes that the source of voiced speech sounds, glottal flow, is independent of the filter, vocal tract. However, acoustic simulations based on the physical speech production models show that, especially when the fundamental frequency (F0) of source harmonics approaches to the first formant frequency (F1) of vocal tract filter, the filter has s...
The IRIS Project A liaison between industry and academia towards natural multimodal communication
Freıtas, Joao; Sara, Candeıas; Mıguel, Sales Dıas; Eduardo, Lleıda; Alfonso, Ortega; Antonıo, Teıxeıra; Samuel, Sılva; Acartürk, Cengiz; Veronıca, Orvalho (null; 2014-11-30)
his paper describes a project with the overall goal of providing a natural interaction communication platform accessible and adapted for all users, especially for people with speech impairments and elderly, by sharing knowledge between Industry and Academia. The platform will adopt the princi-ples of natural user interfaces such as speech, silent speech, gestures, picto-grams, among others, and will provide a set of services that allow easy access to social networks, friends and remote family members, thus ...
Bimodal automatic speech segmentation based on audio and visual information fusion
Akdemir, Eren; Çiloğlu, Tolga (2011-07-01)
Bimodal automatic speech segmentation using visual information together with audio data is introduced. The accuracy of automatic segmentation directly affects the quality of speech processing systems using the segmented database. The collaboration of audio and visual data results in lower average absolute boundary error between the manual segmentation and automatic segmentation results. The information from two modalities are fused at the feature level and used in a HMM based speech segmentation system. A T...
Citation Formats
E. Zaim, “Two channel adaptive speech enhancement,” M.S. - Master of Science, Middle East Technical University, 2014.