Automatic multi-modal dialogue scene indexing

2001-10-10
An automatic algorithm for indexing dialogue scenes in multimedia content is proposed The content is segmented into dialogue scenes using the state transitions of a hidden Markov model (HMM) Each shot is classified using both audio and visual information to determine the state/scene transitions for this model Face detection and silence/speech/music classification are the basic tools which are utilized to index the scenes While face information is extracted after applying some heuristics to skin-colored regions, audio analysis is achieved by examining signal energy, periodicity and zero crossing rate (ZCR) of the audio waveform The simulation results show the possibility of automatically indexing the dialogues using the proposed algorithm.
International Conference on Image Processing (ICIP 2001)

Suggestions

Prioritized sequential 3D reconstruction in video sequences with multiple motions
Imre, Evren; Knorr, Sebastian; Alatan, Abdullah Aydın; Sikora, Thomas (2006-10-11)
in this study, an algorithm is proposed to solve the multi-frame structure from motion (MFSfM) problem for monocular video sequences in dynamic scenes. The algorithm uses the epipolar criterion to segment the features belonging to independently moving objects. Once the features are segmented, corresponding objects are reconstructed individually by using a sequential algorithm, which is also capable of prioritizing the frame pairs with respect to their reliability and information content, thus achieving a fa...
Linear Separability Analysis for Stacked Generalization Architecture
Ozay, Mete; Vural, Fatos T. Yarman (2009-04-11)
Stacked Generalization algorithm aims to increase the individual classification performances of the classifiers by combining the information obtained from various classifiers in a multilayer architecture by either linear or nonlinear techniques. Performance of the algorithm varies depending on the application domains and the space analyses that affect the classification performances could riot be applied successfully.
Summarizing video: Content, features, and HMM topologies
Yasaroglu, Y; Alatan, Abdullah Aydın (2003-01-01)
An algorithm is proposed for automatic summarization of multimedia content by segmenting digital video into semantic scenes using HMMs. Various multi-modal low-level features are extracted to determine state transitions in HMMs for summarization. Advantage of using different model topologies and observation sets in order to segment different content types is emphasized and verified by simulations. Performance of the proposed algorithm is also compared with a deterministic scene segmentation method. A better...
Human action recognition with line and flow histograms
İKİZLER CİNBİŞ, NAZLI; Cinbiş, Ramazan Gökberk; DUYGULU ŞAHİN, PINAR (2008-12-11)
We present a compact representation for human action recognition in videos using line and optical flow histograms. We introduce a new shape descriptor based on the distribution of lines which are fitted to boundaries of human figures. By using an entropy-based approach, we apply feature selection to densify our feature representation, thus, minimizing classification time without degrading accuracy. We also use a compact representation of optical flow for motion information. Using line and flow histograms to...
Joint source-channel coding for error resilient transmission of static 3D models
Bici, Mehmet Oguz; Norkin, Andrey; Akar, Gözde (2012-01-01)
In this paper, performance analysis of joint source-channel coding techniques for error-resilient transmission of three dimensional (3D) models are presented. In particular, packet based transmission scenarios are analyzed. The packet loss resilient methods are classified into two groups according to progressive compression schemes employed: Compressed Progressive Meshes (CPM) based methods and wavelet based methods. In the first group, layers of CPM algorithm are protected unequally by Forward Error Correc...
Citation Formats
A. A. Alatan, “Automatic multi-modal dialogue scene indexing,” presented at the International Conference on Image Processing (ICIP 2001), THESSALONIKI, GREECE, 2001, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55886.