Novel Optimization Models to Generalize Deep Metric Learning

Download
2022-8-24
Gürbüz, Yeti Ziya
Deep metric learning (DML) aims to fit a parametric embedding function to data of semantic information (e.g. images) so that l2-distance between embedded samples is low whenever they share similar semantic entities. An embedding function of such behavior is attained by minimizing empirical expected pairwise loss that penalizes inter-/intra-class proximity violations in embedding space. Proxy-based methods which use a learnable embedding vector per class in their loss formulation are state-of-the-art. We first address characterizing generalization error of proxy-based methods. We reformulate DML as a chance-constrained optimization problem and through careful theoretical analysis, we show that DML with better generalization guarantees can be achieved by iteratively minimizing a proxy-based loss and re-initializing proxies with embeddings of new samples. Second, we consider critical desideratum for DML: generalization to unseen data. We analyze global average pooling (GAP) which is an effective architectural choice to aggregate information in DML. With theoretical and empirical supports, we explain effectiveness of GAP by considering each feature vector as representing a different semantic entity and GAP as a convex combination of them. Following this perspective, we generalize GAP and propose a learnable generalized sum pooling method (GSP) improving GAP with two distinct abilities: i) the ability to choose a subset of semantic entities, effectively learning to ignore nuisance information, and ii) learning the weights corresponding to the importance of each entity. We further propose a zero-shot loss to ease the learning of GSP. We show the effectiveness of our contributions with extensive evaluations on 4 popular DML benchmarks.

Suggestions

A rule-based method for object segmentation in video sequences
Alatan, Abdullah Aydın; Onural, L (1997-01-01)
Object segmentation and tracking are problems within the scope of MPEG-4 and MPEG-7 standardization activities. A novel algorithm for both object segmentation and tracking is presented. The algorithm fuses motion, color, and accumulated previous segmentation data at 'region level', in contrast to conventional 'pixel level' approaches. The information fusion is achieved by a rule-based region processing unit which intelligently utilizes the motion information to locate the objects in the scene, the color inf...
Fine-grained object recognition and zero-shot learning in multispectral imagery
Sumbul, Gencer; Cinbiş, Ramazan Gökberk; AKSOY, SELİM (2018-05-05)
We present a method for fine-grained object recognition problem, that aims to recognize the type of an object among a large number of sub-categories, and zero-shot learning scenario on multispectral images. In order to establish a relation between seen classes and new unseen classes, a compatibility function between image features extracted from a convolutional neural network and auxiliary information of classes is learnt. Knowledge transfer for unseen classes is carried out by maximizing this function. Per...
Analysis of Face Recognition Algorithms for Online and Automatic Annotation of Personal Videos
Yılmaztürk, Mehmet; Ulusoy Parnas, İlkay; Çiçekli, Fehime Nihan (Springer, Dordrecht; 2010-05-08)
Different from previous automatic but offline annotation systems, this paper studies automatic and online face annotation for personal videos/episodes of TV series considering Nearest Neighbourhood, LDA and SVM classification with Local Binary Patterns, Discrete Cosine Transform and Histogram of Oriented Gradients feature extraction methods in terms of their recognition accuracies and execution times. The best performing feature extraction method and the classifier pair is found out to be SVM classification...
Deep Metric Learning With Alternating Projections Onto Feasible Sets
Can, Oğul; Gürbüz, Yeti Z.; Alatan, Abdullah Aydın (2021-01-01)
Minimizers of the typical distance metric learning loss functions can be considered as "feasible points" satisfying a set of constraints imposed by the training data. We reformulate distance metric learning problem as finding a feasible point of a constraint set where the embedding vectors of the training data satisfy desired intra-class and inter-class proximity. The feasible set induced by the constraint set is expressed as the intersection of the relaxed feasible sets which enforce the proximity constrai...
Flexible Content Extraction and Querying for Videos
Demir, Utku; KOYUNCU, Murat; Yazıcı, Adnan; Yilmaz, Turgay; SERT, MUSTAFA (2011-10-28)
In this study, a multimedia database system which includes a semantic content extractor, a high-dimensional index structure and an intelligent fuzzy object-oriented database component is proposed. The proposed system is realized by following a component-oriented approach. It supports different flexible query capabilities for the requirements of video users, which is the main focus of this paper. The query performance of the system (including automatic semantic content extraction) is tested and analyzed in t...
Citation Formats
Y. Z. Gürbüz, “Novel Optimization Models to Generalize Deep Metric Learning,” Ph.D. - Doctoral Program, Middle East Technical University, 2022.