Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing

A class of audio-visual content is segmented into dialogue scenes using the state transitions of a novel hidden Markov model (HMM). Each shot is classi ed using both audio track and visual content to determine the state/scene transitions of the model. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. More- over, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.


Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing
Alatan, Abdullah Aydın; WOLF, WAYNE (2001-06-01)
A class of audio-visual data (fiction entertainment: movies, TV series) is segmented into scenes, which contain dialogs, using a novel hidden Markov model-based (HMM) method. Each shot is classified using both audio track (via classification of speech, silence and music) and visual content (face and location information). The result of this shot-based classification is an audio-visual token to be used by the HMM state diagram to achieve scene analysis. After simulations with circular and left-to-right HMM t...
AKTIHANOGLU, M; OZGUC, B; AYKANAT, C (Springer Science and Business Media LLC, 1994-01-01)
This paper describes a system for modeling, animating, previewing and rendering articulated objects. The system has a modeler of objects that consists of joints and segments. The animator interactively positions the articulated object in its stick, control vertex, or rectangular prism representation and previews the motion in real time. Then the data representing the motion and the models is sent to a multicomputer [iPSC/2 Hypercube (Intel)]. The frames are rendered in parallel, exploiting the coherence bet...
Automatic multi-modal dialogue scene indexing
Alatan, Abdullah Aydın (2001-10-10)
An automatic algorithm for indexing dialogue scenes in multimedia content is proposed The content is segmented into dialogue scenes using the state transitions of a hidden Markov model (HMM) Each shot is classified using both audio and visual information to determine the state/scene transitions for this model Face detection and silence/speech/music classification are the basic tools which are utilized to index the scenes While face information is extracted after applying some heuristics to skin-colored regi...
Depth assisted object segmentation in multi-view video
Cigla, Cevahir; Alatan, Abdullah Aydın (2008-01-01)
In this work, a novel and unified approach for multi-view video (MVV) object segmentation is presented. In the first stage, a region-based graph-theoretic color segmentation algorithm is proposed, in which the popular Normalized Cuts segmentation method is improved with some modifications on its graph structure. Segmentation is obtained by recursive bi-partitioning of a weighted graph of an initial over-segmentation mask. The available segmentation mask is also utilized during dense depth map estimation ste...
Improvisation based on imitating human players by a robotic acoustic musical device ROMI built as a self playing compound acoustic musical robot
Aydın, Kubilay Kaan; Erkmen, Aydan Müşerref; Department of Electrical and Electronics Engineering (2013)
In this thesis we introduce the robotic device ROMI together with its control architecture, the musical state representation and focus on parameter estimation for imitation of duo players by ROMI. ROMI is aimed at jointly playing two instruments that belong to two different classes and improvises while assisting others in an orchestral performance. A new improvisation algorithm based on parameter estimation process for the imitation control is introduced. This new improvisation algorithm adds the capability...
Citation Formats
A. A. Alatan and W. Wolf, “Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing,” 2000, Accessed: 00, 2020. [Online]. Available: