Spectral modification for context-free voice conversion using MELP speech coding framework

Date

2004-10-22

Author

Salor, O
Demirekler, Mübeccel

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

249
views

0
downloads

In this work, we have focused on spectral modification of speech for voice con version from one speaker to another. Speech conversion aims to modify the speech of one speaker such that the modified speech sounds as if spoken by another speaker. MELP (Mixed Excitation Linear Prediction) speech coding algorithm has been used as speech analysis and synthesis framework. Using a 230-sentence triphone balanced database of the two speakers, a mapping between the 4-stage vector quantization indexes for line spectral frequencies (LSF's) of the two speakers have been obtained. This mapping provides a context-free speech conversion for spectral properties of the speakers. Two different methods have been proposed to obtain the LSF mapping. The first method determines the corresponding source and the target LSF codeword indexes, while the second method finds a new LSF codebook for the target speaker. After the spectral modification, pitch modification is applied to the source speaker's residual to approximate the target speaker's pitch range and then the modified filter is driven by the modified residual signal. Subjective ABX listening tests have been carried out and the correct speaker perception rate has been obtained as 80% and 77% for the first and the second spectral conversion methods respectively. For future work, we are planning to integrate our previous work, on LPC filter and residual relationship analysis to increase the correct speaker perception rate.

Subject Keywords

Speech coding, Filters, Prediction algorithms, Speech analysis, Speech synthesis, Databases, Vector quantization, Indexes, Frequency, Testing

URI

https://hdl.handle.net/11511/57562

DOI

https://doi.org/10.1109/isimp.2004.1434063

Conference Name

International Symposium on Intelligent Multimedia, Video and Speech Processing

Collections

Graduate School of Natural and Applied Sciences, Conference / Seminar

Suggestions

OpenMETU
Core

Two channel adaptive speech enhancement Zaim, Erman; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2014) In this thesis, speech enhancement problem is studied and a speech enhancement system is implemented on TMS320C5505 fixed point DSP. Speech degradation due to the signal leakage into the reference microphone and uncorrelated signals between microphones are studied. Limitations of fixed point implementations are examined. Theoretical complexities of weight adaptation algorithms are examined. Moreover, differences between theoretical and practical complexities of weight adaptation algorithms due to the select...
KNOWLEDGE-BASED SPEECH SYNTHESIS BY CONCATENATION OF PHONEME SAMPLES OZUM, IY; Bulut, Mehmet Mete (1994-04-14) In this work a speech synthesis system is implemented. The system uses concatenation of phoneme waveforms as the method of synthesis. These waveforms are generated by sampling the speech of a human speaker and then separating it into its phonemes. These phoneme samples are stored in the hard disk to be used in the synthesis. Then the text to be read is separated into its syllables and each syllable is synthesized by concatenating the phoneme samples. This method is facilitated by the structure of the Turkis...
FROM ACOUSTICS TO VOCAL TRACT TIME FUNCTIONS Mitra, Vikramjit; Oezbek, I. Yuecel; Nam, Hosung; Zhou, Xinhui; Espy-Wilson, Carol Y. (2009-04-24) In this paper we present a technique for obtaining Vocal Tract (VT) time functions from the acoustic speech signal. Knowledge-based Acoustic Parameters (APs) are extracted from the speech signal and a pertinent subset is used to obtain the mapping between them and the VT time functions. Eight different vocal tract constriction variables consisting of live constriction degree variables,. lip aperture (LA), tongue body (TBCD), tongue tip (TTCD), velum (VEL), and glottis (GLO); and three constriction location ...
Speech conversion using MELP speech coding algorithm Salor, O; Demirekler, Mübeccel (2004-04-30) In this work, MELP (Mixed Excitation Linear Prediction) speech coding algorithm has been used for speech conversion. Speech conversion aims to modify the speech of one speaker such that the modified speech sounds as if spoken by another speaker. Speech modeling of MELP has been used to derive a mapping the between the speech models of the two speakers. We have obtained a mapping which provides a context-free speech conversion. We have mainly considered the spectral properties of the speakers. Using the 230 ...
Nonlinear interactive source-filter models for speech KOÇ, Turgay; Çiloğlu, Tolga (2016-03-01) The linear source-filter model of speech production assumes that the source of the speech sounds is independent of the filter. However, acoustic simulations based on the physical speech production models show that when the fundamental frequency of the source harmonics approaches the first formant of the vocal tract filter, the filter has significant effects on the source due to the nonlinear coupling between them. In this study, two interactive system models are proposed under the quasi steady Bernoulli flo...

Citation Formats

O. Salor and M. Demirekler, “Spectral modification for context-free voice conversion using MELP speech coding framework,” presented at the International Symposium on Intelligent Multimedia, Video and Speech Processing, Hong Kong Polytech Univ, Hong Kong, PEOPLES R CHINA, 2004, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/57562.