TRANSFORMATION-INVARIANT DICTIONARY LEARNING FOR CLASSIFICATION WITH 1-SPARSE REPRESENTATIONS

2014-05-09
Yuzuguler, Ahmet Caner
Vural, Elif
Frossard, Pascal
Sparse representations of images in well-designed dictionaries can be used for effective classification. Meanwhile, training data available in most realistic settings are likely to be exposed to geometric transformations, which poses a challenge for the design of good dictionaries. In this work, we study the problem of learning class-representative dictionaries from geometrically transformed image sets. In order to efficiently take account of arbitrary geometric transformations in the learning, we adopt a representation of the dictionaries in an analytic basis. Then, the proposed algorithm learns atoms that are attracted to the samples of their own class while being repelled from the samples of other classes so that the discrimination between different classes is promoted. The dictionary learning objective is formulated such that it enhances the class-discrimination capabilities of individual atoms rather than the ones of the subspaces they generate, which renders the designed dictionaries especially suitable for fast classification of query images with very sparse approximations. Experimental results demonstrate the performance of the proposed method in handwritten digit recognition applications.

Suggestions

Classification of hyperspectral images based on weighted DMPs
Ulusoy, İlkay; MURA, Mauro Dalla (2012-07-27)
This paper presents a classification method for hyperspectral images utilizing Differential Morphological Profiles (DMPs) which permit to include in the analysis spatial information since they can provide an estimate of the size and contrast characteristics of the structures in an image. Due to the wide variety of objects present in a scene, the pixels belonging to the same semantic structure may not have homogeneous spatial and spectral features. In addition, instead of a single peak (which can be related ...
Approximate Fisher Kernels of Non-iid Image Models for Image Categorization
Cinbiş, Ramazan Gökberk; Schmid, Cordelia (2016-06-01)
The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved...
Multisource region attention network for fine-grained object recognition in remote sensing imagery
Sümbül, Gencer; Cinbiş, Ramazan Gökberk; Aksoy, Selim (Institute of Electrical and Electronics Engineers (IEEE), 2019-07)
Fine-grained object recognition concerns the identification of the type of an object among a large number of closely related subcategories. Multisource data analysis that aims to leverage the complementary spectral, spatial, and structural information embedded in different sources is a promising direction toward solving the fine-grained recognition problem that involves low between-class variance, small training set sizes for rare classes, and class imbalance. However, the common assumption of coregistered ...
Image annotation with semi-supervised clustering
Sayar, Ahmet; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2009)
Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words. Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this th...
Alignment of uncalibrated images for multi-view classification
Arık, Sercan Ömer; Vural, Elif; Frossard, Pascal (2011-12-29)
Efficient solutions for the classification of multi-view images can be built on graph-based algorithms when little information is known about the scene or cameras. Such methods typically require a pairwise similarity measure between images, where a common choice is the Euclidean distance. However, the accuracy of the Euclidean distance as a similarity measure is restricted to cases where images are captured from nearby viewpoints. In settings with large transformations and viewpoint changes, alignment of im...
Citation Formats
A. C. Yuzuguler, E. Vural, and P. Frossard, “TRANSFORMATION-INVARIANT DICTIONARY LEARNING FOR CLASSIFICATION WITH 1-SPARSE REPRESENTATIONS,” 2014, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/53914.