Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

Bernardi, Raffaella
Cakici, Ruket
Elliott, Desmond
Erdem, Aykut
Erdem, Erkut
Ikizler-Cinbis, Nazli
Keller, Frank
Muscat, Adrian
Plank, Barbara
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.
Journal of Artificial Intelligence Research


Adaptive mean-shift for automated multi object tracking
Beyan, C.; Temizel, Alptekin (2012-01-01)
Mean-shift tracking plays an important role in computer vision applications because of its robustness, ease of implementation and computational efficiency. In this study, a fully automatic multiple-object tracker based on mean-shift algorithm is presented. Foreground is extracted using a mixture of Gaussian followed by shadow and noise removal to initialise the object trackers and also used as a kernel mask to make the system more efficient by decreasing the search area and the number of iterations to conve...
RTTES: Real-time search in dynamic environments
Undeger, Cagatay; Polat, Faruk (Springer Science and Business Media LLC, 2007-10-01)
In this paper we propose a real-time search algorithm called Real-Time Target Evaluation Search (RTTES) for the problem of searching a route in grid worlds from a starting point to a static or dynamic target point in real-time. The algorithm makes use of a new effective heuristic method which utilizes environmental information to successfully find solution paths to the target in dynamic and partially observable environments. The method requires analysis of nearby obstacles to determine closed directions and...
Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?
KRÜGER, Norbert; JANSSEN, Peter; Kalkan, Sinan; LAPPE, Markus; LEONARDİS, Ales; PİATER, Justus; Rodriguez-Sanchez, Antonio J.; WİSKOTT, Laurenz (Institute of Electrical and Electronics Engineers (IEEE), 2013-08-01)
Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Or...
Effective gene expression data generation framework based on multi-model approach
Sirin, Utku; Erdogdu, Utku; Polat, Faruk; TAN, MEHMET; Alhajj, Reda (Elsevier BV, 2016-06-01)
Objective: Overcome the lack of enough samples in gene expression data sets having thousands of genes but a small number of samples challenging the computational methods using them.
Toroslu, İsmail Hakkı; HENSCHEN, L (Springer Science and Business Media LLC, 1994-05-01)
The integration of logic rules and relational databases has recently emerged as an important technique for developing knowledge management systems. An important class of logic rules utilized by these systems is the so-called transitive closure rules, the processing of which requires the computation of the transitive closure of database relations referenced by these rules. This article presents a new algorithm suitable for computing the transitive closure of very large database relations. This algorithm proc...
Citation Formats
R. Bernardi et al., “Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures,” Journal of Artificial Intelligence Research, pp. 409–442, 2016, Accessed: 00, 2020. [Online]. Available: