Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Empowering Multimodal Multimedia Information Retrieval Through Semantic Deep Learning
Download
1762996.pdf
ceng saeid sattari.pdf
Date
2024-3-18
Author
SATTARI, SAEID
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
114
views
0
downloads
Cite This
Multimedia data encompasses various modalities, including audio, visual, and text, necessitating the development of robust retrieval methods capable of harnessing these modalities to extract and retrieve semantic information from multimedia sources. This study presents a highly scalable and versatile end-to-end multimodal multimedia information retrieval framework. The core strength of this system lies in its capacity to learn semantic contexts within individual modalities and across different modalities, achieved through the utilization of deep neural models. These models are trained using combinations of queries and relevant shots obtained from query logs. One of the distinguishing features of this framework is its ability to create shot templates representing videos that have not been encountered previously. To enhance retrieval performance, the system employs clustering techniques to retrieve shots similar to these templates. An improved variant of fuzzy clustering with a modified loss function is applied to address the inherent uncertainty in multimodal concepts. Our approach goes beyond simple cluster-based ranking by incorporating Siamese networks for improved re-ranking, thereby enhancing retrieval precision. Additionally, a fusion method incorporating an OWA operator is introduced. This method employs various measures to aggregate ranked lists produced by multiple retrieval systems. The proposed approach leverages parallel processing and transfer learning to extract features from three distinct modalities, ensuring the adaptability and scalability of the framework. To assess its effectiveness, the system is rigorously evaluated through experiments conducted on six widely recognized multimodal datasets. Remarkably, our approach outperforms previous studies in the literature on five of these datasets. The experimental findings, substantiated by statistical tests, conclusively establish the effectiveness of the proposed approach in the field of multimodal multimedia information retrieval.
Subject Keywords
Multimodal multimedia retrieval
,
Deep semantic learning
,
Adaptive fuzzy clustering
,
Information fusion
,
Siamese ranking
,
Ranked lists aggregation
URI
https://hdl.handle.net/11511/115079
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
S. SATTARI, “Empowering Multimodal Multimedia Information Retrieval Through Semantic Deep Learning,” Ph.D. - Doctoral Program, Middle East Technical University, 2024.