Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm

2021-01-01
Kaya, Semih
Vural, Elif
While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of the interpolation functions extending the embedding to the whole data space is as important as the between-class separation and cross-modal alignment criteria. We then propose a multi-modal nonlinear representation learning algorithm that is motivated by these theoretical findings, where the embeddings of the training samples are optimized jointly with the Lipschitz regularity of the interpolators. Experimental comparison to recent multi-modal and single-modal learning algorithms suggests that the proposed method yields promising performance in multi-modal image classification and cross-modal image-text retrieval applications.
IEEE TRANSACTIONS ON IMAGE PROCESSING

Suggestions

Comparison of feature-based and image registration-based retrieval of image data using multidimensional data access methods
Arslan, Serdar; Yazıcı, Adnan; Sacan, Ahmet; Toroslu, İsmail Hakkı; Acar, Esra (Elsevier BV, 2013-07-01)
In information retrieval, efficient similarity search in multimedia collections is a critical task In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem, including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy trade-offs for each of these methods are demonstrated on three different image data sets. Similarity of images is obtained either by a feature-based similarity measure using four MPEG-...
Learning semi-supervised nonlinear embeddings for domain-adaptive pattern recognition
Vural, Elif (null; 2019-05-20)
We study the problem of learning nonlinear data embeddings in order to obtain representations for efficient and domain-invariant recognition of visual patterns. Given observations of a training set of patterns from different classes in two different domains, we propose a method to learn a nonlinear mapping of the data samples from different domains into a common domain. The nonlinear mapping is learnt such that the class means of different domains are mapped to nearby points in the common domain in order to...
Learning Smooth Pattern Transformation Manifolds
Vural, Elif (2013-04-01)
Manifold models provide low-dimensional representations that are useful for processing and analyzing data in a transformation-invariant way. In this paper, we study the problem of learning smooth pattern transformation manifolds from image sets that represent observations of geometrically transformed signals. To construct a manifold, we build a representative pattern whose transformations accurately fit various input images. We examine two objectives of the manifold-building problem, namely, approximation a...
TRACEMIN Fiedler A Parallel Algorithm for Computing the Fiedler Vector
Manguoğlu, Murat; Saied, Faisal; Sameh, Ahmed (null; 2010-06-25)
The eigenvector corresponding to the second smallest eigenvalue of the Laplacian of a graph, known as the Fiedler vector, has a number of applications in areas that include matrix reordering, graph partitioning, protein analysis, data mining, machine learning, and web search. The computation of the Fiedler vector has been regarded as an expensive process as it involves solving a large eigenvalue problem. We present a novel and efficient parallel algorithm for computing the Fiedler vector of large graphs bas...
Similarity matrix framework for data from union of subspaces
Aldroubi, Akram; Sekmen, Ali; Koku, Ahmet Buğra; Cakmak, Ahmet Faruk (2018-09-01)
This paper presents a framework for finding similarity matrices for the segmentation of data W = [w(1)...w(N)] subset of R-D drawn from a union U = boolean OR(M)(i=1) S-i, of independent subspaces {S-i}(i=1)(M), of dimensions {d(i)}(i=1)(M). It is shown that any factorization of W = BP, where columns of B form a basis for data W and they also come from U, can be used to produce a similarity matrix Xi w. In other words, Xi w(i, j) not equal 0, when the columns w(i) and w(j) of W come from the same subspace, ...
Citation Formats
S. Kaya and E. Vural, “Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm,” IEEE TRANSACTIONS ON IMAGE PROCESSING, pp. 4384–4394, 2021, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/90724.