Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Image categorization using Fisher kernels of non-iid image models
Download
index.pdf
Date
2012-01-01
Author
Cinbiş, Ramazan Gökberk
Schmid, Cordelia
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
145
views
119
downloads
Cite This
The bag-of-words (BoW) model treats images as an unordered set of local regions and represents them by visual word histograms. Implicitly, regions are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. We introduce non-iid models by treating the parameters of BoW models as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel we encode an image by the gradient of the data log-likelihood w.r.t. hyper-parameters that control priors on the model parameters. Our representation naturally involves discounting transformations similar to taking square-roots, providing an explanation of why such transformations have proven successful. Using variational inference we extend the basic model to include Gaussian mixtures over local descriptors, and latent topic models to capture the co-occurrence structure of visual words, both improving performance. Our models yield state-of-the-art categorization performance using linear classifiers; without using non-linear transformations such as taking square-roots of features, or using (approximate) explicit embeddings of non-linear kernels.
Subject Keywords
Visualization
,
Computational Modeling
,
Kernel
,
Image Representation
,
Histograms
,
Vectors
,
Mathematical Model
URI
https://hdl.handle.net/11511/56688
DOI
https://doi.org/10.1109/cvpr.2012.6247926
Conference Name
2012 IEEE Conference on Computer Vision and Pattern Recognition
Collections
Department of Computer Engineering, Conference / Seminar
Suggestions
OpenMETU
Core
Approximate Fisher Kernels of Non-iid Image Models for Image Categorization
Cinbiş, Ramazan Gökberk; Schmid, Cordelia (2016-06-01)
The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved...
Image annotation with semi-supervised clustering
Sayar, Ahmet; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2009)
Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words. Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this th...
Texture segmentation using the mixtures of principal component analyzers
Musa, MEM; Duin, RPW; de Ridder, D; Atalay, Mehmet Volkan (2003-01-01)
The problem of segmenting an image into several modalities representing different textures can be modelled using Gaussian mixtures. Moreover, texture image patches when translated, rotated or scaled lie in low dimensional subspaces of the high-dimensional space spanned by the grey values. These two aspects make the mixture of local subspace models worth consideration for segmenting this type of images. In recent years a number of mixtures of local PCA models have been proposed. Most of these models require ...
FEATURE ENCODING MODELS FOR GEOGRAPHIC IMAGE RETRIEVAL AND CATEGORIZATION
Ozkan, Savas; Ates, Tayfun; Tola, Engin; Soysal, Medeni; Esen, Ersin (2014-04-25)
In this work, we survey the perormance of various feature encoding models for geographic image retrieval task Recently introduced Vector-of-Locally-Aggregated Descriptors (VLAD) and its Product Quantization encoded binary version VLAD-PQ are compared with the widely used Bag-of-Word (BoW) model. Evaluation results are shown on a publicly available 21-class LULC dataset. With experiments, it is shown that VLAD outperforms classical BoW representation albeit with some increases in the computation time. Additi...
Topological Navigation Algorithm Design and Analysis Using Spherical Images
Şahin, Yasin; Koku, Ahmet Buğra; Department of Mechanical Engineering (2022-8-23)
A topological navigation algorithm that has the capability of mapping and localization based on visual contents is proposed. Keypoint detection and feature matching are conducted on spherical images to extract significant features among sequential frames. Robot movement direction is estimated based on historical angle differences of significant features to reach the final destination. The navigation process is supported with visual egocentric localization to gain simultaneous localization and mapping compet...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
R. G. Cinbiş and C. Schmid, “Image categorization using Fisher kernels of non-iid image models,” presented at the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/56688.