Content-based audio management and retrieval system for news broadcasts

Download
2009
Doğan, Ebru
The audio signals can provide rich semantic cues for analyzing multimedia content, so audio information has been recently used for content-based multimedia indexing and retrieval. Due to growing amount of audio data, demand for efficient retrieval techniques is increasing. In this thesis work, we propose a complete, scalable and extensible audio based content management and retrieval system for news broadcasts. The proposed system considers classification, segmentation, analysis and retrieval of an audio stream. In the sound classification and segmentation stage, a sound stream is segmented by classifying each sub segment into silence, pure speech, music, environmental sound, speech over music, and speech over environmental sound in multiple steps. Support Vector Machines and Hidden Markov Models are employed for classification and these models are trained by using different sets of MPEG-7 features. In the analysis and retrieval stage, two alternatives exist for users to query audio data. The first of these isolates user from main acoustic classes by providing semantic domain based fuzzy classes. The latter offers users to query audio by giving an audio sample in order to find out the similar segments or by requesting expressive summary of the content directly. Additionally, a series of tests was conducted on audio tracks of TRECVID news broadcasts to evaluate the performance of the proposed solution.

Suggestions

Structural and semantic modeling of audio for content-based querying and browsing
Sert, Mustafa; Baykal, Buyurman; Yazıcı, Adnan (2006-01-01)
A typical content-based audio management system deals with three aspects namely audio segmentation and classification, audio analysis, and content-based retrieval of audio. In this paper, we integrate the three aspects of content-based audio management into a single framework and propose an efficient method for flexible querying and browsing of auditory data. More specifically, we utilize two robust feature sets namely MPEG-7 Audio Spectrum Flatness (ASF) and Mel Frequency Cepstral Coefficients (MFCC) as th...
Multimedia Information Retrieval Using Fuzzy Cluster-Based Model Learning
Sattari, Saeid; Yazıcı, Adnan (2017-07-12)
Multimedia data, particularly digital videos, which contain various modalities (visual, audio, and text) are complex and time consuming to model, process, and retrieve. Therefore, efficient methods are required for retrieval of such complex data. In this paper, we propose a multimodal query level fusion approach using a fuzzy cluster-based learning method to improve the retrieval performance of multimedia data. Experimental results on a real dataset demonstrate that employing fuzzy clustering achieves notab...
Spherical harmonics based acoustic scene analysis for object-based audio
Çöteli, Mert Burkay; Hacıhabiboğlu, Hüseyin; Department of Information Systems (2021-2-19)
Object-based audio relies on elemental audio signals from individual sound sources and their associated metadata to be reconstructed at the listener side. While defining audio objects in a production setting is straightforward, it is not trivial to extract audio objects from more realistic recording scenarios such as concerts. Thus, existing object-based audio standards also define scene-based formats alongside objectbased representations that provide immersive audio, but without the flexibility provided by...
Matrix quantization and mixed excitation based linear predictive speech coding at very low bit rates
Özaydın, Selma; Baykal, Buyurman (Elsevier BV, 2003-10)
A matrix quantization scheme and a very low bit rate vocoder is developed to obtain good quality speech for low capacity communication links. The new matrix quantization method operates at bit rates between 400 and 800 bps and using a 25 ms linear predictive coding (LPC) analysis frame, spectral distortion about 1 dB is achieved at 800 bps. Techniques for improving the performance at very low bit rate vocoding include quantization of residual line spectral frequency (LSF) vectors, multistage matrix quantiza...
Wireless speech recognition using fixed point mixed excitation linear prediction (MELP) vocoder
Acar, D; Karci, MH; Ilk, HG; Demirekler, Mübeccel (2002-07-19)
A bit stream based front-end for wireless speech recognition system that operates on fixed point mixed excitation linear prediction (MELP) vocoder is presented in this paper. Speaker dependent, isolated word recognition accuracies obtained from conventional and bit stream based front-end systems are obtained and their statistical significance is discussed. Feature parameters are extracted from original (wireline) and decoded speech (conventional) and from the quantized spectral information (bit stream) of t...
Citation Formats
E. Doğan, “Content-based audio management and retrieval system for news broadcasts,” M.S. - Master of Science, Middle East Technical University, 2009.