Fusion of multimodal information for multimedia information retrieval

Yılmaz, Turgay
An effective retrieval of multimedia data is based on its semantic content. In order to extract the semantic content, the nature of multimedia data should be analyzed carefully and the information contained should be used completely. Multimedia data usually has a complex structure containing multimodal information. Noise in the data, non-universality of any single modality, and performance upper bound of each modality make it hard to rely on a single modality. Thus, multimodal fusion is a practical approach for improving the retrieval performance. However, two major challenges exist; 'what-to-fuse' and 'how-to-fuse'. In the scope of these challenges, the contribution of this thesis is four-fold. First, a general fusion framework is constructed by analyzing the studies in the literature and identifying the design aspects of general information fusion systems. Second, a class-specific feature selection (CSF) approach and a RELIEF-based modality weighting algorithm (RELIEF-MM) are proposed to handle the 'what-to-fuse' problem. Third, the 'how-to-fuse' problem is studied, and a novel mining and graph based combination approach is proposed. The approach enables an effective combination of the modalities represented with bag-of-words models. Lastly, a non-linear extension on the linear weighted fusion approach is proposed, by handling both of the 'what-to-fuse' and 'how-to-fuse' problems together. We have conducted comprehensive experiments on CalTech101, TRECVID 2007, 2008, 2011 and CCV datasets with various multi-feature and multimodal settings; and validate that our proposed algorithms are efficient, accurate and robust ways of dealing with the given challenges of multimodal information fusion.


Flexible Content Extraction and Querying for Videos
Demir, Utku; KOYUNCU, Murat; Yazıcı, Adnan; Yilmaz, Turgay; SERT, MUSTAFA (2011-10-28)
In this study, a multimedia database system which includes a semantic content extractor, a high-dimensional index structure and an intelligent fuzzy object-oriented database component is proposed. The proposed system is realized by following a component-oriented approach. It supports different flexible query capabilities for the requirements of video users, which is the main focus of this paper. The query performance of the system (including automatic semantic content extraction) is tested and analyzed in t...
A New service architecture for IPTV over internet
Özkardeş, Merve; Schmidt, Şenan Ece; Department of Electrical and Electronics Engineering (2013)
Multimedia applications over the Internet and Internet Protocol Television (IPTV) gain a lot of attention. IPTV has a number of service requirements such as; high bandwidth, scalability, minimum delay, jitter and channel switch time. IP multicast, IMS (IP Multimedia System) Protocol and peer-to-peer approaches are proposed for implementing IPTV. However, IP multicast requires all the routers in the core network to possess multicast capability, IMS does not easily scale and P2P cannot e ciently utilize the n...
An intelligent multimedia information system for multimodal content extraction and querying
Yazıcı, Adnan; Yilmaz, Turgay; Sattari, Saeid; SERT, MUSTAFA; Gulen, Elvan (2018-01-01)
This paper introduces an intelligent multimedia information system, which exploits machine learning and database technologies. The system extracts semantic contents of videos automatically by using the visual, auditory and textual modalities, then, stores the extracted contents in an appropriate format to retrieve them efficiently in subsequent requests for information. The semantic contents are extracted from these three modalities of data separately. Afterwards, the outputs from these modalities are fused...
Investigation of haptic line graph comprehension through co production of gesture and language
Deniz, Ozan; Mehmetcan, Fal; Acartürk, Cengiz (null; 2013-06-30)
In communication settings, statistical graphs accompany language by providing visual access to various aspects of domain entities, such as conveying information about trends. A similar and comparable means for providing perceptual access is to provide haptic graphs for blind people. In this study, we present the results of an experimental study that aimed to investigate visual line graphs and haptic line graphs in time domain by means of gesture production as an indicator of event conceptualization. The par...
Indexing both content and concept for high-dimensional multimedia data
Arslan, Serdar; Yazıcı, Adnan; Department of Computer Engineering (2018)
While understanding the semantic meaning of multimedia content is immediate for humans, it's far from immediate for a computer. This problem is commonly known as the semantic gap which is difference between human perception of multimedia object and extracted low-level features and it is one of the main problems in multimedia retrieval. Thus, in order to achieve better retrieval performance, low-level content features should be combined with semantic features in an efficient way. Another critical task in thi...
Citation Formats
T. Yılmaz, “Fusion of multimodal information for multimedia information retrieval,” Ph.D. - Doctoral Program, Middle East Technical University, 2014.