Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Caption generation on scenes with seen and unseen object categories
Date
2022-08-01
Author
Demirel, Berkan
Cinbiş, Ramazan Gökberk
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
97
views
0
downloads
Cite This
Image caption generation is one of the most challenging problems at the intersection of vision and language domains. In this work, we propose a realistic captioning task where the input scenes may incorporate visual objects with no corresponding visual or textual training examples. For this problem, we propose a detection-driven approach that consists of a single-stage generalized zero-shot detection model to recognize and localize instances of both seen and unseen classes, and a template-based captioning model that transforms detections into sentences. To improve the generalized zero-shot detection model, which provides essential information for captioning, we define effective class representations in terms of class-to-class semantic similarities, and leverage their special structure to construct an effective unseen/seen class confidence score calibration mechanism. We also propose a novel evaluation metric that provides additional insights for the captioning outputs by separately measuring the visual and non-visual contents of generated sentences. Our experiments highlight the importance of studying captioning in the proposed zero-shot setting, and verify the effectiveness of the proposed detection-driven zero-shot captioning approach.
Subject Keywords
Zero -shot learning
,
Zero -shot image captioning
URI
https://www.sciencedirect.com/science/article/pii/S0262885622001445
https://hdl.handle.net/11511/102374
Journal
IMAGE AND VISION COMPUTING
DOI
https://doi.org/10.1016/j.imavis.2022.104515
Collections
Department of Computer Engineering, Article
Suggestions
OpenMETU
Core
Image Captioning with Unseen Objects
Berkan, Demirel; Cinbiş, Ramazan Gökberk; İkizler Cinbiş, Nazlı (2019-09-12)
Image caption generation is a long standing and challenging problem at the intersection of computer vision and natural language processing. A number of recently proposed approaches utilize a fully supervised object recognition model within the captioning approach. Such models, however, tend to generate sentences which only consist of objects predicted by the recognition models, excluding instances of the classes without labelled training examples. In this paper, we propose a new challenging scenario that ta...
Comparison of whole scene image caption models
Görgülü, Tuğrul; Ulusoy, İlkay; Department of Electrical and Electronics Engineering (2021-2-10)
Image captioning is one of the most challenging processes in deep learning area which automatically describes the content of an image by using words and grammar. In recent years, studies are published constantly to improve the quality of this task. However, a detailed comparison of all possible approaches has not been done yet and we cannot know comparative performances of the proposed solutions in the literature. Thus, this thesis aims to redress this problem by making a comparative analysis among six diff...
DATA-DRIVEN IMAGE CAPTIONING WITH META-CLASS BASED RETRIEVAL
Kilickaya, Mert; Erdem, Erkut; Erdem, Aykut; İKİZLER CİNBİŞ, NAZLI; Çakıcı, Ruket (2014-04-25)
Automatic image captioning, the process cif producing a description for an image, is a very challenging problem which has only recently received interest from the computer vision and natural language processing communities. In this study, we present a novel data-driven image captioning strategy, which, for a given image, finds the most visually similar image in a large dataset of image-caption pairs and transfers its caption as the description of the input image. Our novelty lies in employing a recently' pr...
Image resolution enhancement using wavelet domain Hidden Markov Tree and coefficient sign estimation
Temizel, Alptekin (2007-01-01)
Image resolution enhancement using wavelets is a relatively new subject and many new algorithms have been proposed recently. These algorithms assume that the low resolution image is the approximation subband of a higher resolution image and attempts to estimate the unknown detail coefficients to reconstruct a high resolution image. A subset of these recent approaches utilized probabilistic models to estimate these unknown coefficients. Particularly, hidden Markov tree (HMT) based methods using Gaussian mixt...
Analysis of dataset, object tag, and object attribute components in novel object captioning
Şahin, Enes Muvahhid; Akar, Gözde; Department of Electrical and Electronics Engineering (2022-7)
Image captioning is a popular yet challenging task which lies at the intersection of Computer Vision and Natural Language Processing. A specific branch of image captioning called Novel Object Captioning draw attention in recent years. Different from general image captioning, Novel Object Captioning focuses on describing images with novel objects which are not seen during training. Recently, numerous image captioning approaches are proposed in order to increase quality of the generated captions for both gene...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
B. Demirel and R. G. Cinbiş, “Caption generation on scenes with seen and unseen object categories,”
IMAGE AND VISION COMPUTING
, vol. 124, pp. 104515–104515, 2022, Accessed: 00, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0262885622001445.