A NOVEL BOVW MIMICKING END-TO-END TRAINABLE CNN CLASSIFICATION FRAMEWORK USING OPTIMAL TRANSPORT THEORY

Date

2019-01-01

Author

Gürbüz, Yeti Ziya

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

251
views

0
downloads

An end-to-end trainable convolutional neural network (CNN) framework which mimics bag of visual words (BoVW) is proposed for image classification. To this end, a new paradigm for histogram-like image representation is introduced and optimal transport (OT) distance is utilized for the similarity assessment. Any patch of an image is considered as a unique visual word and the image is represented as the uniform histogram of the visual words with the histogram bins associated to embedding vectors according to the semantic meanings of the corresponding visual words. Thus, in the CNN framework, the output of the last convolutional block is considered as the global representation of the image and the embeddings are inherently learned within the classification framework. With the proposed formulation, undesired quantization for the BoVW representation is no more required; moreover, the learned CNN features are naturally interpretable. The experiments on CIFAR-10, CIFAR-100 and SVHN datasets show that the replacement of the global pooling and fully connected layers with the proposed representation together with OT distance improves the baseline CNN framework.

Subject Keywords

Optimal transport, Classification, Image representation, Convolutional network, Bag of visual words

URI

https://hdl.handle.net/11511/43450

DOI

https://doi.org/10.1109/icip.2019.8803276

Collections

Department of Electrical and Electronics Engineering, Conference / Seminar

Suggestions

OpenMETU
Core

A Novel Bag-of-Visual-Words Approach for Geospatial Object Detection Aytekin, Caglar; Alatan, Abdullah Aydın (2011-04-29) A novel bag-of-visual-words algorithm is presented with two extensions compared to its classical version: exploiting scale information and weighting visual words. The scale information that is already extracted with SIFT detector is included as an additional element to the SIFT key-point descriptor, while the visual words are weighted during histogram assignment proportional to their importance which is measured by the ratio of their occurrences in the object to the occurrences in the background. The algori...
A Novel Neural Network Method for Direction of Arrival Estimation with Uniform Cylindrical 12-Element Microstrip Patch Array Caylar, Selcuk; Dural, Guelbin; Leblebicioğlu, Mehmet Kemal (2008-01-01) In this study a new neural network algorithm is proposed for real time multiple source tracking problem with cylindrical patch antenna array based on a previous v reported Modified Neural Multiple Source Tracking Algorithm(MN-MUST). The proposed algorithm, namely Cylindrical Microstrip Patch Array Modified Neural Multiple Source Tracking Algorithm (CMN-MUST) implements W-MUST algorithm on a cylindrical microsttip patch array structure. CMN-MUST algorithm uses the advantage of directive pattern of microstrip...
A NOVEL LEARNING-BASED IMAGE MATCHING APPROACH BASED ON MUTUAL NEAREST NEIGHBOR SEARCH WITH RATIO TEST Efe, Ufuk; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2021-9-09) This thesis proposes a novel image matching method that utilizes learned features extracted by an off-the-shelf deep neural network to obtain a promising performance. The proposed method simply uses a pre-trained VGG architecture as a feature extractor and does not require any additional training to improve matching. Inspired by well-established concepts in the psychology area, such as the Mental Rotation paradigm, an initial warping step is also performed by the help of a preliminary geometric transformati...
A temporal neural network model for constructing connectionist expert system knowledge bases Alpaslan, Ferda Nur (Elsevier BV, 1996-04-01) This paper introduces a temporal feedforward neural network model that can be applied to a number of neural network application areas, including connectionist expert systems. The neural network model has a multi-layer structure, i.e. the number of layers is not limited. Also, the model has the flexibility of defining output nodes in any layer. This is especially important for connectionist expert system applications.
Automatic target recognition of quadcopter type drones from moderately-wideband electromagnetic data using convolutional neural networks Güneri, Rutkay; Sayan, Gönül; Department of Electrical and Electronics Engineering (2022-12-15) In this thesis, the classifier design approach based on “Learning by a Convolutional Neural Network (CNN)” will be applied to two different target library/data sets; an ultra-wideband simulation data (from 37 MHz to 19.1 GHz) obtained for a target library of four dielectric spheres, and a moderately-wide band measurement data (from 3.1 to 4.8 GHz) obtained for a target library of four quadcopter type unmanned aerial vehicles (UAVs). While the bandwidth of simulation data for spherical targets is about nine ...

Citation Formats

Y. Z. Gürbüz, “A NOVEL BOVW MIMICKING END-TO-END TRAINABLE CNN CLASSIFICATION FRAMEWORK USING OPTIMAL TRANSPORT THEORY,” 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/43450.