Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

Download

10.1613:jair.4900.pdf

Date

2016-2-23

Author

Bernardi, Raffaella
Cakici, Ruket
Elliott, Desmond
Erdem, Aykut
Erdem, Erkut
Ikizler-Cinbis, Nazli
Keller, Frank
Muscat, Adrian
Plank, Barbara

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

239
views

169
downloads

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.

Subject Keywords

Artificial Intelligence

URI

https://hdl.handle.net/11511/51427

Journal

Journal of Artificial Intelligence Research

DOI

https://doi.org/10.1613/jair.4900

Collections

Department of Computer Engineering, Article

Suggestions

OpenMETU
Core

Adaptive mean-shift for automated multi object tracking Beyan, C.; Temizel, Alptekin (2012-01-01) Mean-shift tracking plays an important role in computer vision applications because of its robustness, ease of implementation and computational efficiency. In this study, a fully automatic multiple-object tracker based on mean-shift algorithm is presented. Foreground is extracted using a mixture of Gaussian followed by shadow and noise removal to initialise the object trackers and also used as a kernel mask to make the system more efficient by decreasing the search area and the number of iterations to conve...
Hierarchical representations for visual object tracking by detection Beşbınar, Beril; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2015) Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have improved the state of the art in many areas such as visual object detection, scene understanding or speech recognition. Rebirth of these fairly old computational models is usually related to the availability of large datasets, increase in the computational power of current hardware and more recently proposed unsupervised training methods that exploit the internal structure of very lar...
RTTES: Real-time search in dynamic environments Undeger, Cagatay; Polat, Faruk (Springer Science and Business Media LLC, 2007-10-01) In this paper we propose a real-time search algorithm called Real-Time Target Evaluation Search (RTTES) for the problem of searching a route in grid worlds from a starting point to a static or dynamic target point in real-time. The algorithm makes use of a new effective heuristic method which utilizes environmental information to successfully find solution paths to the target in dynamic and partially observable environments. The method requires analysis of nearby obstacles to determine closed directions and...
Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision? KRÜGER, Norbert; JANSSEN, Peter; Kalkan, Sinan; LAPPE, Markus; LEONARDİS, Ales; PİATER, Justus; Rodriguez-Sanchez, Antonio J.; WİSKOTT, Laurenz (Institute of Electrical and Electronics Engineers (IEEE), 2013-08-01) Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Or...
Effective gene expression data generation framework based on multi-model approach Sirin, Utku; Erdogdu, Utku; Polat, Faruk; TAN, MEHMET; Alhajj, Reda (Elsevier BV, 2016-06-01) Objective: Overcome the lack of enough samples in gene expression data sets having thousands of genes but a small number of samples challenging the computational methods using them.

Citation Formats

R. Bernardi et al., “Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures,” Journal of Artificial Intelligence Research, pp. 409–442, 2016, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/51427.