Dynamic programming approach to voice transformation

Date

2006-10-01

Author

Salor, Ozgul
Demirekler, Mübeccel

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

197
views

0
downloads

This paper presents a voice transformation algorithm which modifies the speech of a source speaker such that it is perceived as if spoken by a target speaker. A novel method which is based on dynamic programming approach is proposed. The designed system obtains speaker-specific codebooks of line spectral frequencies (LSFs) for both source and target speakers. Those codebooks are used to train a mapping histogram matrix, which is used for LSF transformation from one speaker to the other. The baseline system uses the maxima of the histogram matrix for LSF transformation. The shortcomings of this system, which are the limitations of the target LSF space and the spectral discontinuities due to independent mapping of subsequent frames, have been overcome by applying the dynamic programming approach. Dynamic programming approach tries to model the long-term behaviour of LSFs of the target speaker, while it is trying to preserve the relationship between the subsequent frames of the source LSFs, during transformation. Both objective and subjective evaluations have been conducted and it has been shown that dynamic programming approach improves the performance of the system in terms of both the speech quality and speaker similarity.

Subject Keywords

Linguistics and Language, Modelling and Simulation, Software, Communication, Computer Vision and Pattern Recognition, Language and Linguistics, Computer Science Applications

URI

https://hdl.handle.net/11511/52297

Journal

SPEECH COMMUNICATION

DOI

https://doi.org/10.1016/j.specom.2006.06.003

Collections

Graduate School of Natural and Applied Sciences, Article

Suggestions

OpenMETU
Core

The use of articulator motion information in automatic speech segmentation Akdemir, Eren; Çiloğlu, Tolga (Elsevier BV, 2008-07-01) The use of articulator motion information in automatic speech segmentation is investigated. Automatic speech segmentation is an essential task in speech processing applications like speech synthesis where accuracy and consistency of segmentation are firmly connected to the quality of synthetic speech. The motions of upper and lower lips are incorporated into a hidden Markov model based segmentation process. The MOCHA-TIMIT database, which involves simultaneous articulatograph and microphone recordings, was ...
The discourse connector list: a multi-genre cross-cultural corpus analysis Kalajahi, Seyed Ali Rezvani; Abdullah, Ain Nadzimah; Neufeld, Steve (Walter de Gruyter GmbH, 2017-05-01) This study examines the linguistic feature known as discourse connector using a corpus-informed approach. The study applies a taxonomy which classifies and describes 632 discourse connectors in eight broad classes with 17 categories. The frequency of use of each discourse connector listed was analyzed in the three different registers of spoken, non-academic and academic English in the two different cultural contexts of British and American English. The resulting data on discourse connector frequency were co...
Verb concepts from affordances Kalkan, Sinan; Yuerueten, Onur; Borghi, Anna M.; Şahin, Erol (John Benjamins Publishing Company, 2014-01-01) In this paper, we investigate how the interactions of a robot with its environment can be used to create concepts that are typically represented by verbs in language. Towards this end, we utilize the notion of affordances to argue that verbs typically refer to the generation of a specific type of effect rather than a specific type of action. Then, we show how a robot can form these concepts through interactions with the environment and how humans can use these concepts to ease their communication with the r...
The combinatory morphemic lexicon Bozsahin, C (MIT Press - Journals, 2002-06-01) Grammars that expect words from the lexicon may be at odds with the transparent projection of syntactic and semantic scope relations of smaller units. We propose a morphosyntactic framework based on Combinatory Categorial Grammar that provides flexible constituency, flexible category consistency, and lexical projection of morphosyntactic properties and attachment to grammar in order to establish a morphemic grammar-lexicon. These mechanisms provide enough expressive power in the lexicon to formulate semanti...
Language learning from the perspective of nonlinear dynamic systems Hohenberger, Annette Edeltraud; Peltzer-Karpf, Annemarie (Walter de Gruyter GmbH, 2009-01-01) This article outlines a nonlinear dynamic systems approach to language learning on the basis of developmental cognitive neuroscience. Language learning, on this view, is a process of experience-dependent shaping and selection of broadly defined domain-general and domain-specific genetic predispositions. The central concept of development is (neuro) cognitive,e growth in terms of self-organization. Linguistic structure-building is synergetic and emergent insofar as the acquisition of a critical mass of eleme...

Citation Formats

O. Salor and M. Demirekler, “Dynamic programming approach to voice transformation,” SPEECH COMMUNICATION, pp. 1262–1272, 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/52297.