Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features
Date
2009-07-25
Author
Dogan, Ebru
SERT, MUSTAFA
Yazicit, Adnan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
134
views
0
downloads
Cite This
This paper describes the development of a generated solution for classification and segmentation of broadcast news audio, A sound stream is segmented by classifying each sub-segment into silence, pure speech, music, environmental sound, speech over music, and speech over environmental sound classes in multiple steps. Support Vector Machines and Hidden Markov Models are employed for classification and these models are trained by using different sets of MPEG-7 features. A series of tests was conducted on hand-labeled audio tracks of TRECVID broadcast news to evaluate the performance of MPEG-7 features and the selected classification methods in the proposed solution. The results obtained,from our experiments clearly demonstrate that classification of mixed type audio data using Audio Spectrum Centroid, Audio Spectrum Spread, and Audio Spectrum Flatness features has considerably high accuracy rates in news domain.
Subject Keywords
MPEG 7 standard
,
Hidden Markov models
,
Broadcasting
,
Speech
,
Loudspeakers
,
Support vector machines
,
Support vector machine classification
,
Music
,
Streaming media
,
Testing
URI
https://hdl.handle.net/11511/67315
DOI
https://doi.org/10.1109/mmedia.2009.35
Collections
Department of Computer Engineering, Conference / Seminar
Suggestions
OpenMETU
Core
SPEECH DETECTION ON BROADCAST AUDIO
Zubari, Unal; Ozan, Ezgi Can; Acar, Banu Oskay; Çiloğlu, Tolga; Esen, Ersin; Ates, Tugrul K.; Onur, Duygu Oskay (2010-08-27)
Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-spe...
Content-based audio management and retrieval system for news broadcasts
Doğan, Ebru; Yazıcı, Adnan; Department of Computer Engineering (2009)
The audio signals can provide rich semantic cues for analyzing multimedia content, so audio information has been recently used for content-based multimedia indexing and retrieval. Due to growing amount of audio data, demand for efficient retrieval techniques is increasing. In this thesis work, we propose a complete, scalable and extensible audio based content management and retrieval system for news broadcasts. The proposed system considers classification, segmentation, analysis and retrieval of an audio st...
Wireless speech recognition using fixed point mixed excitation linear prediction (MELP) vocoder
Acar, D; Karci, MH; Ilk, HG; Demirekler, Mübeccel (2002-07-19)
A bit stream based front-end for wireless speech recognition system that operates on fixed point mixed excitation linear prediction (MELP) vocoder is presented in this paper. Speaker dependent, isolated word recognition accuracies obtained from conventional and bit stream based front-end systems are obtained and their statistical significance is discussed. Feature parameters are extracted from original (wireline) and decoded speech (conventional) and from the quantized spectral information (bit stream) of t...
Nonlinear interactive source-filter models for speech
KOÇ, Turgay; Çiloğlu, Tolga (2016-03-01)
The linear source-filter model of speech production assumes that the source of the speech sounds is independent of the filter. However, acoustic simulations based on the physical speech production models show that when the fundamental frequency of the source harmonics approaches the first formant of the vocal tract filter, the filter has significant effects on the source due to the nonlinear coupling between them. In this study, two interactive system models are proposed under the quasi steady Bernoulli flo...
Nonlinear interactive source-filter model for voiced speech
Koç, Turgay; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2012)
The linear source-filter model (LSFM) has been used as a primary model for speech processing since 1960 when G. Fant presented acoustic speech production theory. It assumes that the source of voiced speech sounds, glottal flow, is independent of the filter, vocal tract. However, acoustic simulations based on the physical speech production models show that, especially when the fundamental frequency (F0) of source harmonics approaches to the first formant frequency (F1) of vocal tract filter, the filter has s...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
E. Dogan, M. SERT, and A. Yazicit, “Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features,” 2009, p. 152, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/67315.