Fusing semantic information extracted from visual, auditory and textual data of videos

Download
2012
Gönül, Elvan
In recent years, due to the increasing usage of videos, manual information extraction is becoming insufficient to users. Therefore, extracting semantic information automatically turns out to be a serious requirement. Today, there exists some systems that extract semantic information automatically by using visual, auditory and textual data separately but the number of studies that uses more than one data source is very limited. As some studies on this topic have already shown, using multimodal video data for automatic information extraction ensures getting better results by guaranteeing increase in the accuracy of semantic information that is retrieved from visual, auditory and textual sources. In this thesis, a complete system which fuses the semantic information that is obtained from visual, auditory and textual video data is introduced. The fusion system carries out the following procedures; analyzing and uniting the semantic information that is extracted from multimodal data by utilizing concept interactions and consequently generating a semantic dataset which is ready to be stored in a database. Besides, experiments are conducted to compare results obtained from the proposed multimodal fusion operation with results obtained as an outcome of semantic information extraction from just one modality and other fusion methods. The results indicate that fusing all available information along with concept relations yields better results than any unimodal approaches and other traditional fusion methods in overall.

Suggestions

Automatic semantic content extraction in videos using a spatio-temporal ontology model
Yıldırım, Yakup; Yazıcı, Adnan; Department of Computer Engineering (2009)
Recent increase in the use of video in many applications has revealed the need for extracting the content in videos. Raw data and low-level features alone are not sufficient to fulfill the user's need; that is, a deeper understanding of the content at the semantic level is required. Currently, manual techniques are being used to bridge the gap between low-level representative features and high-level semantic content, which are inefficient, subjective and costly in time and have limitations on querying capab...
Automatic Semantic Content Extraction in Videos Using a Fuzzy Ontology and Rule-Based Model
Yildirim, Yakup; Yazıcı, Adnan; Yilmaz, Turgay (2013-01-01)
Recent increase in the use of video-based applications has revealed the need for extracting the content in videos. Raw data and low-level features alone are not sufficient to fulfill the user's needs; that is, a deeper understanding of the content at the semantic level is required. Currently, manual techniques, which are inefficient, subjective and costly in time and limit the querying capabilities, are being used to bridge the gap between low-level representative features and high-level semantic content. H...
Multilingual dynamic linking of web resources
Dönmez, Uğur; Coşar, Ahmet; Yeşilada, Yeliz; Department of Computer Engineering (2014)
The World Wide Web is successful for locating, browsing and publishing information by its scalable architecture. However, the Web suffers from some limitations. For example, links on the Web are embedded in documents. Links are only unidirectional, ownership is required to place an anchor in documents, and authoring links is an expensive process. The embedded link structure of the Web can be improved by Semantic Web. By using Semantic Web components, existing Web resources can be enriched with additional ex...
Flexible querying using structural and event based multimodal video data model
Oztarak, Hakan; Yazıcı, Adnan (2006-01-01)
Investments on multimedia technology enable us to store many more reflections of the real world in digital world as videos so that we carry a lot of information to the digital world directly. In order to store and efficiently query this information, a video database system (VDBS) is necessary. We propose a structural, event based and multimodal (SEBM) video data model which supports three different modalities that are visual, auditory and textual modalities for VDBSs and we can dissolve these three modaliti...
Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos
Kucuk, Dilek; Yazıcı, Adnan (Elsevier BV, 2011-08-01)
This paper targets at the problem of automatic semantic indexing of news videos by presenting a video annotation and retrieval system which is able to perform automatic semantic annotation of news video archives and provide access to the archives via these annotations. The presented system relies on the video texts as the information source and exploits several information extraction techniques on these texts to arrive at representative semantic information regarding the underlying videos. These techniques ...
Citation Formats
E. Gönül, “Fusing semantic information extracted from visual, auditory and textual data of videos,” M.S. - Master of Science, Middle East Technical University, 2012.