The use of articulator motion information in automatic speech segmentation

Date

2008-07-01

Author

Akdemir, Eren
Çiloğlu, Tolga

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

225
views

0
downloads

The use of articulator motion information in automatic speech segmentation is investigated. Automatic speech segmentation is an essential task in speech processing applications like speech synthesis where accuracy and consistency of segmentation are firmly connected to the quality of synthetic speech. The motions of upper and lower lips are incorporated into a hidden Markov model based segmentation process. The MOCHA-TIMIT database, which involves simultaneous articulatograph and microphone recordings, was used to develop and test the models. Different feature vector compositions are proposed for incorporation of articulator motion parameters to the automatic segmentation system. Average absolute boundary error of the system with respect to manual segmentation is decreased by 10.1%. The results are examined in a boundary class dependent manner using both acoustic and visual phone classes, and the performance of the system in different boundary types is discussed. After analyzing the boundary class dependent performance, the error reduction is increased to 18.0% by using the appropriate feature vectors in selected boundaries.

Subject Keywords

Linguistics and Language, Modelling and Simulation, Software, Communication, Computer Vision and Pattern Recognition, Language and Linguistics, Computer Science Applications

URI

https://hdl.handle.net/11511/48170

Journal

SPEECH COMMUNICATION

DOI

https://doi.org/10.1016/j.specom.2008.04.005

Collections

Department of Electrical and Electronics Engineering, Article

Suggestions

OpenMETU
Core

Dynamic programming approach to voice transformation Salor, Ozgul; Demirekler, Mübeccel (Elsevier BV, 2006-10-01) This paper presents a voice transformation algorithm which modifies the speech of a source speaker such that it is perceived as if spoken by a target speaker. A novel method which is based on dynamic programming approach is proposed. The designed system obtains speaker-specific codebooks of line spectral frequencies (LSFs) for both source and target speakers. Those codebooks are used to train a mapping histogram matrix, which is used for LSF transformation from one speaker to the other. The baseline system ...
The discourse connector list: a multi-genre cross-cultural corpus analysis Kalajahi, Seyed Ali Rezvani; Abdullah, Ain Nadzimah; Neufeld, Steve (Walter de Gruyter GmbH, 2017-05-01) This study examines the linguistic feature known as discourse connector using a corpus-informed approach. The study applies a taxonomy which classifies and describes 632 discourse connectors in eight broad classes with 17 categories. The frequency of use of each discourse connector listed was analyzed in the three different registers of spoken, non-academic and academic English in the two different cultural contexts of British and American English. The resulting data on discourse connector frequency were co...
An investigation of incidental vocabulary acquisition in relation to learner proficiency level and word frequency Tekmen, E. Anne Ferrell; Daloğlu, Ayşegül (Wiley, 2006-06-01) This study examined the relationship between learners' incidental vocabulary acquisition and their level of proficiency, and between acquisition and word frequency in a text. Participants were Turkish learners of English at three proficiency levels. One reading text and four vocabulary tests were administered over a two-week period. Analyses of the data revealed that lexical gains from reading were significant for each group (p < .05). The higher proficiency groups were able to acquire more words than lower...
Modified condensed nearest neighbor rule as applied to speaker independent word recognition Mansur, A.; Yarman Vural, Fatoş Tunay; Yalabık, Neşe (Elsevier BV, 1988-12) Edited and Condensed Nearest Neighbor Rules are used in various applications in Pattern Recognition problems. In this study, modified versions of these algorithms are applied to speaker-independent isolated word recognition to select the word templates, as opposed to the clustering techniques. It is shown that the approach improves the recognition rate when compared with clustering, with the disadvantage of being more costly.
The combinatory morphemic lexicon Bozsahin, C (MIT Press - Journals, 2002-06-01) Grammars that expect words from the lexicon may be at odds with the transparent projection of syntactic and semantic scope relations of smaller units. We propose a morphosyntactic framework based on Combinatory Categorial Grammar that provides flexible constituency, flexible category consistency, and lexical projection of morphosyntactic properties and attachment to grammar in order to establish a morphemic grammar-lexicon. These mechanisms provide enough expressive power in the lexicon to formulate semanti...

Citation Formats

E. Akdemir and T. Çiloğlu, “The use of articulator motion information in automatic speech segmentation,” SPEECH COMMUNICATION, pp. 594–604, 2008, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/48170.