Video segmentation based on audio feature extraction

Download

index.pdf

Date

2009

Author

Atar, Neriman

Metadata

Show full item record

Item Usage Stats

190
views

85
downloads

In this study, an automatic video segmentation and classification system based on audio features has been presented. Video sequences are classified such as videos with “speech”, “music”, “crowd” and “silence”. The segments that do not belong to these regions are left as “unclassified”. For the silence segment detection, a simple threshold comparison method has been done on the short time energy feature of the embedded audio sequence. For the “speech”, “music” and “crowd” segment detection a multiclass classification scheme has been applied. For this purpose, three audio feature set have been formed, one of them is purely MPEG-7 audio features, other is the audio features that is used in [31] the last one is the combination of these two feature sets. For choosing the best feature a histogram comparison method has been used. Audio segmentation system was trained and tested with these feature sets. The evaluation results show that the Feature Set 3 that is the combination of other two feature sets gives better performance for the audio classification system. The output of the classification system is an XML file which contains MPEG-7 audio segment descriptors for the video sequence. An application scenario is given by combining the audio segmentation results with visual analysis results for getting audio-visual video segments.

Subject Keywords

Electrical engineering., Video segmentation.

URI

http://etd.lib.metu.edu.tr/upload/12610397/index.pdf
https://hdl.handle.net/11511/18438

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Cluster based user scheduling schemes to exploit multiuser diversity in wireless broadcast channels Soydan, Yusuf; Candan, Çağatay; Department of Electrical and Electronics Engineering (2008) Diversity methods are used to improve the reliability of the communication between transmitter and receiver. These methods use redundancy to reduce the errors in the communication link. Apart from the conventional diversity methods, multiuser diversity has an aim of maximizing the sum capacity of a multi-user system. To benefit from multiuser diversity, the opportunistic scheduling method grants the channel access to the user which has the best channel quality among all users. Therefore, the cumulative sum ...
Video Content Analysis Method for Audiovisual Quality Assessment Konuk, Baris; Zerman, Emin; NUR YILMAZ, GÖKÇE; Akar, Gözde (2016-06-08) In this study a novel, spatio-temporal characteristics based video content analysis method is presented. The proposed method has been evaluated on different video quality assessment databases, which include videos with different characteristics and distortion types. Test results obtained on different databases demonstrate the robustness and accuracy of the proposed content analysis method. Moreover, this analysis method is employed in order to examine the performance improvement in audiovisual quality asses...
Digital modulation recognition Erdem, Erem; Tanık, Yalçın; Department of Electrical and Electronics Engineering (2009) In this thesis work, automatic recognition algorithms for digital modulated signals are surveyed. Feature extraction and classification algorithm stages are the main parts of a modulation recognition system. Performance of the modulation recognition system mainly depends on the prior knowledge of some of the signal parameters, selection of the key features and classification algorithm selection. Unfortunately, most of the features require some of the signal parameters such as carrier frequency, pulse shape,...
Implementation of a distributed video codec Işık, Cem Vedat; Akar, Gözde; Department of Electrical and Electronics Engineering (2008) Current interframe video compression standards such as the MPEG4 and H.264, require a high-complexity encoder for predictive coding to exploit the similarities among successive video frames. This requirement is acceptable for cases where the video sequence to be transmitted is encoded once and decoded many times. However, some emerging applications such as video-based sensor networks, power-aware surveillance and mobile video communication systems require computational complexity to be shifted from encoder ...
Parameter extraction and image enhancement for catadioptric omnidirectional cameras Baştanlar, Yalın; Çetin, Yasemin; Department of Information Systems (2005) In this thesis, catadioptric omnidirectional imaging systems are analyzed in detail. Omnidirectional image (ODI) formation characteristics of different camera-mirror configurations are examined and geometrical relations for panoramic and perspective image generation with common mirror types are summarized. A method is developed to determine the unknown parameters of a hyperboloidal-mirrored system using the world coordinates of a set of points and their corresponding image points on the ODI. A linear relati...

Citation Formats

N. Atar, “Video segmentation based on audio feature extraction,” M.S. - Master of Science, Middle East Technical University, 2009.