Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Analysis of dataset, object tag, and object attribute components in novel object captioning
Download
enes_muvahhid_sahin_masters_thesis.pdf
Date
2022-7
Author
Şahin, Enes Muvahhid
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
304
views
225
downloads
Cite This
Image captioning is a popular yet challenging task which lies at the intersection of Computer Vision and Natural Language Processing. A specific branch of image captioning called Novel Object Captioning draw attention in recent years. Different from general image captioning, Novel Object Captioning focuses on describing images with novel objects which are not seen during training. Recently, numerous image captioning approaches are proposed in order to increase quality of the generated captions for both general image captioning and Novel Object Captioning. These methods benefit from large object detection datasets for Novel Object Captioning. They also utilize specific set of object tags (class names) in the image. Even though these approaches are very successful in many aspects, they require GPU-weeks of training on several large datasets. Furthermore, captions generated by these methods may lack visual grounding and overlook details in the image. Thus, in this thesis, we analyze the dataset, object tag, and object attribute components for Novel Object Captioning. We perform Visual Vocabulary Pretraining (VIVO) [1] on small-scale [2] and large-scale [3] datasets and compare the captioning performances of a state-of-the-art method [4] in order to analyze the effect of dataset size. To analyze the effect of tag quality on Novel Object Captioning performance, we compare the performance of captioning methods [4] trained with two different set of object tags: a large set of tags but lacking novel objects, a small set of tags with novel objects. Finally, to obtain richer captions and alleviate overlooked details in the image, we propose a novel approach in which object attributes in the image are exploited. Experimental results are demonstrated on both Novel Object Captioning and general image captioning tasks. The results show that novel object tags play a vital role for Novel Object Captioning and proposed method generates richer and more detailed captions compared to the baseline.
Subject Keywords
Image captioning
,
Novel object captioning
,
Vision and language pretraining
,
Object tags
,
Object attributes
URI
https://hdl.handle.net/11511/98157
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
DATA-DRIVEN IMAGE CAPTIONING WITH META-CLASS BASED RETRIEVAL
Kilickaya, Mert; Erdem, Erkut; Erdem, Aykut; İKİZLER CİNBİŞ, NAZLI; Çakıcı, Ruket (2014-04-25)
Automatic image captioning, the process cif producing a description for an image, is a very challenging problem which has only recently received interest from the computer vision and natural language processing communities. In this study, we present a novel data-driven image captioning strategy, which, for a given image, finds the most visually similar image in a large dataset of image-caption pairs and transfers its caption as the description of the input image. Our novelty lies in employing a recently' pr...
Image Captioning with Unseen Objects
Berkan, Demirel; Cinbiş, Ramazan Gökberk; İkizler Cinbiş, Nazlı (2019-09-12)
Image caption generation is a long standing and challenging problem at the intersection of computer vision and natural language processing. A number of recently proposed approaches utilize a fully supervised object recognition model within the captioning approach. Such models, however, tend to generate sentences which only consist of objects predicted by the recognition models, excluding instances of the classes without labelled training examples. In this paper, we propose a new challenging scenario that ta...
Comparison of whole scene image caption models
Görgülü, Tuğrul; Ulusoy, İlkay; Department of Electrical and Electronics Engineering (2021-2-10)
Image captioning is one of the most challenging processes in deep learning area which automatically describes the content of an image by using words and grammar. In recent years, studies are published constantly to improve the quality of this task. However, a detailed comparison of all possible approaches has not been done yet and we cannot know comparative performances of the proposed solutions in the literature. Thus, this thesis aims to redress this problem by making a comparative analysis among six diff...
Data-driven image captioning via salient region discovery
Kilickaya, Mert; Akkuş, Burak Kerim; Çakıcı, Ruket; Erdem, Aykut; Erdem, Erkut; İKİZLER CİNBİŞ, NAZLI (Institution of Engineering and Technology (IET), 2017-09-01)
n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image r...
A Confidence Ranked Co-Occurrence Approach for Accurate Object Recognition in Highly Complex Scenes
Angın, Pelin (2013-01-01)
Real-time and accurate classification of objects in highly complex scenes is an important problem for the Computer Vision community due to its many application areas. While boosting methods with the sliding window approach provide fast processing and accurate results for particular object categories, they cannot achieve the desired performance for more involved categories of objects. Recent research in Computer Vision has shown that exploiting object context through relational dependencies between object ca...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
E. M. Şahin, “Analysis of dataset, object tag, and object attribute components in novel object captioning,” M.S. - Master of Science, Middle East Technical University, 2022.