Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features

2009-07-25
Dogan, Ebru
SERT, MUSTAFA
Yazicit, Adnan
This paper describes the development of a generated solution for classification and segmentation of broadcast news audio, A sound stream is segmented by classifying each sub-segment into silence, pure speech, music, environmental sound, speech over music, and speech over environmental sound classes in multiple steps. Support Vector Machines and Hidden Markov Models are employed for classification and these models are trained by using different sets of MPEG-7 features. A series of tests was conducted on hand-labeled audio tracks of TRECVID broadcast news to evaluate the performance of MPEG-7 features and the selected classification methods in the proposed solution. The results obtained,from our experiments clearly demonstrate that classification of mixed type audio data using Audio Spectrum Centroid, Audio Spectrum Spread, and Audio Spectrum Flatness features has considerably high accuracy rates in news domain.

Suggestions

SPEECH DETECTION ON BROADCAST AUDIO
Zubari, Unal; Ozan, Ezgi Can; Acar, Banu Oskay; Çiloğlu, Tolga; Esen, Ersin; Ates, Tugrul K.; Onur, Duygu Oskay (2010-08-27)
Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-spe...
Content-based audio management and retrieval system for news broadcasts
Doğan, Ebru; Yazıcı, Adnan; Department of Computer Engineering (2009)
The audio signals can provide rich semantic cues for analyzing multimedia content, so audio information has been recently used for content-based multimedia indexing and retrieval. Due to growing amount of audio data, demand for efficient retrieval techniques is increasing. In this thesis work, we propose a complete, scalable and extensible audio based content management and retrieval system for news broadcasts. The proposed system considers classification, segmentation, analysis and retrieval of an audio st...
Wireless speech recognition using fixed point mixed excitation linear prediction (MELP) vocoder
Acar, D; Karci, MH; Ilk, HG; Demirekler, Mübeccel (2002-07-19)
A bit stream based front-end for wireless speech recognition system that operates on fixed point mixed excitation linear prediction (MELP) vocoder is presented in this paper. Speaker dependent, isolated word recognition accuracies obtained from conventional and bit stream based front-end systems are obtained and their statistical significance is discussed. Feature parameters are extracted from original (wireline) and decoded speech (conventional) and from the quantized spectral information (bit stream) of t...
Nonlinear interactive source-filter models for speech
KOÇ, Turgay; Çiloğlu, Tolga (2016-03-01)
The linear source-filter model of speech production assumes that the source of the speech sounds is independent of the filter. However, acoustic simulations based on the physical speech production models show that when the fundamental frequency of the source harmonics approaches the first formant of the vocal tract filter, the filter has significant effects on the source due to the nonlinear coupling between them. In this study, two interactive system models are proposed under the quasi steady Bernoulli flo...
Nonlinear interactive source-filter model for voiced speech
Koç, Turgay; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2012)
The linear source-filter model (LSFM) has been used as a primary model for speech processing since 1960 when G. Fant presented acoustic speech production theory. It assumes that the source of voiced speech sounds, glottal flow, is independent of the filter, vocal tract. However, acoustic simulations based on the physical speech production models show that, especially when the fundamental frequency (F0) of source harmonics approaches to the first formant frequency (F1) of vocal tract filter, the filter has s...
Citation Formats
E. Dogan, M. SERT, and A. Yazicit, “Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features,” 2009, p. 152, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/67315.