Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Data-driven image captioning via salient region discovery
Download
index.pdf
Date
2017-09-01
Author
Kilickaya, Mert
Akkuş, Burak Kerim
Çakıcı, Ruket
Erdem, Aykut
Erdem, Erkut
İKİZLER CİNBİŞ, NAZLI
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
358
views
0
downloads
Cite This
n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models.
Subject Keywords
Software
,
Computer Vision and Pattern Recognition
URI
https://hdl.handle.net/11511/52195
Journal
IET COMPUTER VISION
DOI
https://doi.org/10.1049/iet-cvi.2016.0286
Collections
Department of Computer Engineering, Article
Suggestions
OpenMETU
Core
Continuous dimensionality characterization of image structures
Felsberg, Michael; Kalkan, Sinan; Kruger, Norbert (Elsevier BV, 2009-05-04)
Intrinsic dimensionality is a concept introduced by statistics and later used in image processing to measure the dimensionality of a data set. In this paper, we introduce a continuous representation of the intrinsic dimension of an image patch in terms of its local spectrum or, equivalently, its gradient field. By making use of a cone structure and barycentric co-ordinates, we can associate three confidences to the three different ideal cases of intrinsic dimensions corresponding to homogeneous image patche...
Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?
KRÜGER, Norbert; JANSSEN, Peter; Kalkan, Sinan; LAPPE, Markus; LEONARDİS, Ales; PİATER, Justus; Rodriguez-Sanchez, Antonio J.; WİSKOTT, Laurenz (Institute of Electrical and Electronics Engineers (IEEE), 2013-08-01)
Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Or...
Key protected classification for collaborative learning
Sariyildiz, Mert Bulent; Cinbiş, Ramazan Gökberk; Ayday, Erman (Elsevier BV, 2020-08-01)
© 2020Large-scale datasets play a fundamental role in training deep learning models. However, dataset collection is difficult in domains that involve sensitive information. Collaborative learning techniques provide a privacy-preserving solution, by enabling training over a number of private datasets that are not shared by their owners. However, recently, it has been shown that the existing collaborative learning frameworks are vulnerable to an active adversary that runs a generative adversarial network (GAN...
Motion estimation using complex discrete wavelet transform
Sarı, Hüseyin; Severcan, Mete; Department of Electrical and Electronics Engineering (2003)
The estimation of optical flow has become a vital research field in image sequence analysis especially in past two decades, which found applications in many fields such as stereo optics, video compression, robotics and computer vision. In this thesis, the complex wavelet based algorithm for the estimation of optical flow developed by Magarey and Kingsbury is implemented and investigated. The algorithm is based on a complex version of the discrete wavelet transform (CDWT), which analyzes an image through blo...
Catadioptric hyperspectral imaging, an unmixing approach
Baskurt, Didem Ozisik; BAŞTANLAR, YALIN; Çetin, Yasemin (Institution of Engineering and Technology (IET), 2020-10-01)
Hyperspectral imaging systems provide dense spectral information on the scene under investigation by collecting data from a high number of contiguous bands of the electromagnetic spectrum. The low spatial resolutions of these sensors frequently give rise to the mixing problem in remote sensing applications. Several unmixing approaches are developed in order to handle the challenging mixing problem on perspective images. On the other hand, omnidirectional imaging systems provide a 360-degree field of view in...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
M. Kilickaya, B. K. Akkuş, R. Çakıcı, A. Erdem, E. Erdem, and N. İKİZLER CİNBİŞ, “Data-driven image captioning via salient region discovery,”
IET COMPUTER VISION
, pp. 398–406, 2017, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/52195.