Approximate Fisher Kernels of Non-iid Image Models for Image Categorization

Download
2016-06-01
Cinbiş, Ramazan Gökberk
Schmid, Cordelia
The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved by employing discounting transformations such as power normalization. In this paper, we introduce non-iid models by treating the model parameters as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel principle we encode an image by the gradient of the data log-likelihood w.r.t. the model hyper-parameters. Our models naturally generate discounting effects in the representations; suggesting that such transformations have proven successful because they closely correspond to the representations obtained for non-iid models. To enable tractable computation, we rely on variational free-energy bounds to learn the hyper-parameters and to compute approximate Fisher kernels. Our experimental evaluation results validate that our models lead to performance improvements comparable to using power normalization, as employed in state-of-the-art feature aggregation methods.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Suggestions

Image categorization using Fisher kernels of non-iid image models
Cinbiş, Ramazan Gökberk; Schmid, Cordelia (2012-01-01)
The bag-of-words (BoW) model treats images as an unordered set of local regions and represents them by visual word histograms. Implicitly, regions are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. We introduce non-iid models by treating the parameters of BoW models as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel we encode an image by the gradient of the data log-likelihood w....
Image annotation with semi-supervised clustering
Sayar, Ahmet; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2009)
Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words. Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this th...
TRANSFORMATION-INVARIANT DICTIONARY LEARNING FOR CLASSIFICATION WITH 1-SPARSE REPRESENTATIONS
Yuzuguler, Ahmet Caner; Vural, Elif; Frossard, Pascal (2014-05-09)
Sparse representations of images in well-designed dictionaries can be used for effective classification. Meanwhile, training data available in most realistic settings are likely to be exposed to geometric transformations, which poses a challenge for the design of good dictionaries. In this work, we study the problem of learning class-representative dictionaries from geometrically transformed image sets. In order to efficiently take account of arbitrary geometric transformations in the learning, we adopt a r...
A performance study of the tangent distance method in transformation-invariant image classification
Vural, Elif (2015-08-06)
A common problem in image analysis is the transformation-invariant estimation of the similarity between a query image and a set of reference images representing different classes. This typically requires the comparison of the distance between the query image and the transformation manifolds of the reference images. The tangent distance algorithm is a popular method that estimates the manifold distance by employing a linear approximation of the transformation manifolds. In this paper, we present a performanc...
SCALE-SPACE APPROACH FOR THE COMPARISON OF HK AND SC CURVATURE DESCRIPTIONS AS APPLIED TO OBJECT RECOGNITION
Akagunduz, Erdem; Eskizara, Oemer; Ulusoy, İlkay (2009-11-10)
Using mean curvature (H) and Gaussian curvature (K) values or shape index (S) and curvedness (C) values, HK and SC curvature spaces are constructed in order to classify surface patches into types such as pits, peaks, saddles etc. Since both HK and SC curvature spaces classify surface patches in to similar types, their classification capabilities are comparable. Previously, HK and SC curvature spaces were compared in terms of their classification ability only at the given data resolution [2]. When calculatin...
Citation Formats
R. G. Cinbiş and C. Schmid, “Approximate Fisher Kernels of Non-iid Image Models for Image Categorization,” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, pp. 1084–1098, 2016, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/57819.