Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing

2001-06-01
A class of audio-visual data (fiction entertainment: movies, TV series) is segmented into scenes, which contain dialogs, using a novel hidden Markov model-based (HMM) method. Each shot is classified using both audio track (via classification of speech, silence and music) and visual content (face and location information). The result of this shot-based classification is an audio-visual token to be used by the HMM state diagram to achieve scene analysis. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.
MULTIMEDIA TOOLS AND APPLICATIONS

Suggestions

Automatic multi-modal dialogue scene indexing
Alatan, Abdullah Aydın (2001-10-10)
An automatic algorithm for indexing dialogue scenes in multimedia content is proposed The content is segmented into dialogue scenes using the state transitions of a hidden Markov model (HMM) Each shot is classified using both audio and visual information to determine the state/scene transitions for this model Face detection and silence/speech/music classification are the basic tools which are utilized to index the scenes While face information is extracted after applying some heuristics to skin-colored regi...
Summarizing video: Content, features, and HMM topologies
Yasaroglu, Y; Alatan, Abdullah Aydın (2003-01-01)
An algorithm is proposed for automatic summarization of multimedia content by segmenting digital video into semantic scenes using HMMs. Various multi-modal low-level features are extracted to determine state transitions in HMMs for summarization. Advantage of using different model topologies and observation sets in order to segment different content types is emphasized and verified by simulations. Performance of the proposed algorithm is also compared with a deterministic scene segmentation method. A better...
Simultaneous segmentation of images and shapes
Tarı, Zehra Sibel (1997-01-01)
A novel method for simultaneous image segmentation and shape decomposition is presented. The method may be applied directly to grayscale images. The method is based on the analysis of the level curves of an ''edge-strength'' function which is a measure of boundaryness of the image at each point.
Multimodal query-level fusion for efficient multimedia information retrieval
Sattari, Saeid; Yazıcı, Adnan (2018-10-01)
Managing a large volume of multimedia data containing various modalities such as visual, audio, and text reveals the necessity for efficient methods for modeling, processing, storing, and retrieving complex data. In this paper, we propose a fusion-based approach at the query level to improve query retrieval performance of multimedia data. We discuss various flexible query types including the combination of content as well as concept-based queries that provide users with the ability to efficiently perform mu...
Div-BLAST: Diversification of Sequence Search Results
Eser, Elif; Can, Tolga; Ferhatosmanoglu, Hakan (PUBLIC LIBRARY SCIENCE, 1160 BATTERY STREET, STE 100, SAN FRANCISCO, CA 94111 USA, 2014-12-22)
Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided ...
Citation Formats
A. A. Alatan and W. WOLF, “Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing,” MULTIMEDIA TOOLS AND APPLICATIONS, pp. 137–151, 2001, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/36487.