Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Generating expressive summaries for speech and musical audio using self-similarity clues
Download
index.pdf
Date
2006-07-12
Author
Sert, Mustafa
Baykal, Buyurman
Yazıcı, Adnan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
156
views
0
downloads
Cite This
We present a novel algorithm for structural analysis of audio to detect repetitive patterns that are suitable for content-based audio information retrieval systems, since repetitive patterns can provide valuable information about the content of audio, such as a chorus or a concept. The Audio Spectrum Flatness (ASF) feature of the MPEG7 standard, although not having been considered as much as other feature types, has been utilized and evaluated as the underlying feature set. Expressive summaries are chosen as the longest patterns by the k-means clustering algorithm. Proposed approach is evaluated on a test bed consisting of popular song and speech clips based on the ASF feature. The well known Mel Frequency Cepstral Coefficients (MFCCs) are also considered in the experiments for the evaluation of features. Experiments show that, all the repetitive patterns and their locations are obtained with the accuracy of 93% and 78% for music and speech, respectively.
Subject Keywords
Information analysis
,
Pattern analysis
,
Algorithm design and analysis
,
Music information retrieval
,
Content based retrieval
,
MPEG 7 Standard
,
Clustering algorithms
,
Testing
,
Speech analysis
,
Mel frequency cepstral coefficient
URI
https://hdl.handle.net/11511/35213
DOI
https://doi.org/10.1109/icme.2006.262675
Collections
Department of Electrical and Electronics Engineering, Conference / Seminar
Suggestions
OpenMETU
Core
Content-based audio management and retrieval system for news broadcasts
Doğan, Ebru; Yazıcı, Adnan; Department of Computer Engineering (2009)
The audio signals can provide rich semantic cues for analyzing multimedia content, so audio information has been recently used for content-based multimedia indexing and retrieval. Due to growing amount of audio data, demand for efficient retrieval techniques is increasing. In this thesis work, we propose a complete, scalable and extensible audio based content management and retrieval system for news broadcasts. The proposed system considers classification, segmentation, analysis and retrieval of an audio st...
Structural and semantic modeling of audio for content-based querying and browsing
Sert, Mustafa; Baykal, Buyurman; Yazıcı, Adnan (2006-01-01)
A typical content-based audio management system deals with three aspects namely audio segmentation and classification, audio analysis, and content-based retrieval of audio. In this paper, we integrate the three aspects of content-based audio management into a single framework and propose an efficient method for flexible querying and browsing of auditory data. More specifically, we utilize two robust feature sets namely MPEG-7 Audio Spectrum Flatness (ASF) and Mel Frequency Cepstral Coefficients (MFCC) as th...
Automatic multi-modal dialogue scene indexing
Alatan, Abdullah Aydın (2001-10-10)
An automatic algorithm for indexing dialogue scenes in multimedia content is proposed The content is segmented into dialogue scenes using the state transitions of a hidden Markov model (HMM) Each shot is classified using both audio and visual information to determine the state/scene transitions for this model Face detection and silence/speech/music classification are the basic tools which are utilized to index the scenes While face information is extracted after applying some heuristics to skin-colored regi...
Combining Structural Analysis and Computer Vision Techniques for Automatic Speech Summarization
Sert, Mustafa ; Baykal, Buyurman; Yazıcı, Adnan (2008-12-17)
Similar to verse and chorus sections that appear as repetitive structures in musical audio, key-concept (or topic) of some speech recordings (e.g., presentations, lectures, etc.) may also repeat itself over the time. Hence, accurate detection of these repetitions may be helpful to the success of automatic speech summarization. Based on this motivation, we consider the applicability of music structural analysis methods to speech summary generation. Our method transforms a 1 - D time-domain speech signal to a...
Comparison of Subjective and Objective Evaluation Methods for Audio Source Separation
Josef, Kornycky; Günel Kılıç, Banu; Ahmet, Kondoz (2008-01-01)
The evaluation of audio separation algorithms can either be performed objectively by calculation of numerical measures, or subjectively through listening tests. Although objective evaluation is inherently more straightforward, subjective listening tests are still essential in determining the perceived quality of separation. This paper aims to find relationships between objective and subjective results so that numerical values can be translated into perceptual criteria. A generic audio source separatio...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
M. Sert, B. Baykal, and A. Yazıcı, “Generating expressive summaries for speech and musical audio using self-similarity clues,” 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35213.