Generating expressive summaries for speech and musical audio using self-similarity clues

Download
2006-07-12
We present a novel algorithm for structural analysis of audio to detect repetitive patterns that are suitable for content-based audio information retrieval systems, since repetitive patterns can provide valuable information about the content of audio, such as a chorus or a concept. The Audio Spectrum Flatness (ASF) feature of the MPEG7 standard, although not having been considered as much as other feature types, has been utilized and evaluated as the underlying feature set. Expressive summaries are chosen as the longest patterns by the k-means clustering algorithm. Proposed approach is evaluated on a test bed consisting of popular song and speech clips based on the ASF feature. The well known Mel Frequency Cepstral Coefficients (MFCCs) are also considered in the experiments for the evaluation of features. Experiments show that, all the repetitive patterns and their locations are obtained with the accuracy of 93% and 78% for music and speech, respectively.

Suggestions

Content-based audio management and retrieval system for news broadcasts
Doğan, Ebru; Yazıcı, Adnan; Department of Computer Engineering (2009)
The audio signals can provide rich semantic cues for analyzing multimedia content, so audio information has been recently used for content-based multimedia indexing and retrieval. Due to growing amount of audio data, demand for efficient retrieval techniques is increasing. In this thesis work, we propose a complete, scalable and extensible audio based content management and retrieval system for news broadcasts. The proposed system considers classification, segmentation, analysis and retrieval of an audio st...
Structural and semantic modeling of audio for content-based querying and browsing
Sert, Mustafa; Baykal, Buyurman; Yazıcı, Adnan (2006-01-01)
A typical content-based audio management system deals with three aspects namely audio segmentation and classification, audio analysis, and content-based retrieval of audio. In this paper, we integrate the three aspects of content-based audio management into a single framework and propose an efficient method for flexible querying and browsing of auditory data. More specifically, we utilize two robust feature sets namely MPEG-7 Audio Spectrum Flatness (ASF) and Mel Frequency Cepstral Coefficients (MFCC) as th...
Automatic multi-modal dialogue scene indexing
Alatan, Abdullah Aydın (2001-10-10)
An automatic algorithm for indexing dialogue scenes in multimedia content is proposed The content is segmented into dialogue scenes using the state transitions of a hidden Markov model (HMM) Each shot is classified using both audio and visual information to determine the state/scene transitions for this model Face detection and silence/speech/music classification are the basic tools which are utilized to index the scenes While face information is extracted after applying some heuristics to skin-colored regi...
Combining Structural Analysis and Computer Vision Techniques for Automatic Speech Summarization
Sert, Mustafa ; Baykal, Buyurman; Yazıcı, Adnan (2008-12-17)
Similar to verse and chorus sections that appear as repetitive structures in musical audio, key-concept (or topic) of some speech recordings (e.g., presentations, lectures, etc.) may also repeat itself over the time. Hence, accurate detection of these repetitions may be helpful to the success of automatic speech summarization. Based on this motivation, we consider the applicability of music structural analysis methods to speech summary generation. Our method transforms a 1 - D time-domain speech signal to a...
Comparison of Subjective and Objective Evaluation Methods for Audio Source Separation
Josef, Kornycky; Günel Kılıç, Banu; Ahmet, Kondoz (2008-01-01)
The evaluation of audio separation algorithms can either be performed objectively by calculation of numerical measures, or subjectively through listening tests. Although objective evaluation is inherently more straightforward, subjective listening tests are still essential in determining the perceived quality of separation. This paper aims to find relationships between objective and subjective results so that numerical values can be translated into perceptual criteria. A generic audio source separatio...
Citation Formats
M. Sert, B. Baykal, and A. Yazıcı, “Generating expressive summaries for speech and musical audio using self-similarity clues,” 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35213.