Novel refinement method for automatic image annotation systems

Demircioğlu, Erşan
Image annotation could be defined as the process of assigning a set of content related words to the image. An automatic image annotation system constructs the relationship between words and low level visual descriptors, which are extracted from images and by using these relationships annotates a newly seen image. The high demand on image annotation requirement increases the need to automatic image annotation systems. However, performances of current annotation methods are far from practical usage. The most common problem of current methods is the gap between semantic words and low level visual descriptors. Because of the semantic gap, annotation results of these methods contain irrelevant noisy words. To give more relevant results, refinement methods should be applied to classical image annotation outputs. In this work, we represent a novel refinement approach for image annotation problem. The proposed system attacks the semantic gap problem by using the relationship between the words which are obtained from the dataset. Establishment of this relationship is the most crucial problem of the refinement process. In this study, we suggest a probabilistic and fuzzy approach for modelling the relationship among the words in the vocabulary, which is then employed to generate candidate annotations, based on the output of the image annotator. Candidate annotations are represented by a set of relational graphs. Finally, one of the generated candidate annotations is selected as a refined annotation result by using a clique optimization technique applied to the candidate annotation graph.


Image annotation with semi-supervised clustering
Sayar, Ahmet; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2009)
Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words. Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this th...
HANOLISTIC: A Hierarchical Automatic Image Annotation System Using Holistic Approach
Karadag, Ozge Oztimur; Yarman Vural, Fatoş Tunay (2009-06-25)
Automatic image annotation is the process of assigning keywords to digital images depending on the content information. In one sense, it is a mapping from the visual content information to the semantic context information. In this study, we propose a novel approach for automatic image annotation problem, where the annotation is formulated as a multivariate mapping from a set of independent descriptor spaces, representing a whole image, to a set of words, representing class labels. For this purpose, a hierar...
Comparison of whole scene image caption models
Görgülü, Tuğrul; Ulusoy, İlkay; Department of Electrical and Electronics Engineering (2021-2-10)
Image captioning is one of the most challenging processes in deep learning area which automatically describes the content of an image by using words and grammar. In recent years, studies are published constantly to improve the quality of this task. However, a detailed comparison of all possible approaches has not been done yet and we cannot know comparative performances of the proposed solutions in the literature. Thus, this thesis aims to redress this problem by making a comparative analysis among six diff...
Optical flow based video frame segmentation and segment classification
Akpınar, Samet; Alpaslan, Ferda Nur; Department of Computer Engineering (2018)
Video information retrieval is a field of multimedia research enabling us to extract desired semantic information from video data. In content-based video information retrieval, visual content obtained from video scenes is utilized. For developing methods to cope with content-based video information retrieval in terms of temporal concepts such as action, event, etc., representation of temporal information becomes critical. In this thesis, action detection is tackled based on a temporal video representation m...
Intra prediction with 3-tap filters for lossless and lossy video coding
Ranjbar Alvar, Saeed; Kamışlı, Fatih; Department of Electrical and Electronics Engineering (2016)
Video coders are primarily designed for lossy compression. The basic steps in modern lossy video compression are block-based spatial or temporal prediction, transformation of the prediction error block, quantization of the transform coefficients and entropy coding of the quantized coefficients together with other side information. In some cases, this lossy coding architecture may not be efficient for compression. For example, when lossless video compression is desirable, the transform and quantization steps...
Citation Formats
E. Demircioğlu, “Novel refinement method for automatic image annotation systems,” M.S. - Master of Science, Middle East Technical University, 2011.