A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning

2018-01-01
Supervised manifold learning methods learn data representations by preserving the geometric structure of data while enhancing the separation between data samples from different classes. In this work, we propose a theoretical study of supervised manifold learning for classification. We consider nonlinear dimensionality reduction algorithms that yield linearly separable embeddings of training data and present generalization bounds for this type of algorithms. A necessary condition for satisfactory generalization performance is that the embedding allow the construction of a sufficiently regular interpolation function in relation with the separation margin of the embedding. We show that for supervised embeddings satisfying this condition, the classification error decays at an exponential rate with the number of training samples. Finally, we examine the separability of supervised nonlinear embeddings that aim to preserve the low-dimensional geometric structure of data based on graph representations. The proposed analysis is supported by experiments on several real data sets.
JOURNAL OF MACHINE LEARNING RESEARCH

Suggestions

Out-of-Sample Generalizations for Supervised Manifold Learning for Classification
Vural, Elif (2016-03-01)
Supervised manifold learning methods for data classification map high-dimensional data samples to a lower dimensional domain in a structure-preserving way while increasing the separation between different classes. Most manifold learning methods compute the embedding only of the initially available data; however, the generalization of the embedding to novel points, i.e., the out-of-sample extension problem, becomes especially important in classification applications. In this paper, we propose a semi-supervis...
A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions
Vural, Elif (2021-01-07)
Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient condit...
A Trie-structured Bayesian Model for Unsupervised Morphological Segmentation
Kurfalı, Murathan; Ustun, Ahmet; CAN BUĞLALILAR, BURCU (2017-04-23)
In this paper, we introduce a trie-structured Bayesian model for unsupervised morphological segmentation. We adopt prior information from different sources in the model. We use neural word embeddings to discover words that are morphologically derived from each other and thereby that are semantically similar. We use letter successor variety counts obtained from tries that are built by neural word embeddings. Our results show that using different information sources such as neural word embeddings and letter s...
A neuro-fuzzy MAR algorithm for temporal rule-based systems
Sisman, NA; Alpaslan, Ferda Nur; Akman, V (1999-08-04)
This paper introduces a new neuro-fuzzy model for constructing a knowledge base of temporal fuzzy rules obtained by the Multivariate Autoregressive (MAR) algorithm. The model described contains two main parts, one for fuzzy-rule extraction and one for the storage of extracted rules. The fuzzy rules are obtained from time series data using the MAR algorithm. Time-series analysis basically deals with tabular data. It interprets the data obtained for making inferences about future behavior of the variables. Fu...
A survey on persistence landscape theory
Gürses, Selçuk; Pamuk, Semra; Department of Mathematics (2022-7)
Topological data analysis (TDA) consists of a growing collection of techniques that reveal the shape of data. These techniques may be especially useful for comprehend ing global features of high-dimensional data that are inaccessible via other methods. The usage of TDA has been constrained by the difficulties of merging the subject’s primary tool, the barcode or persistence diagram, with statistics and machine learning. The persistence landscape is a stable topological summary that is easily combinable with...
Citation Formats
E. Vural, “A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning,” JOURNAL OF MACHINE LEARNING RESEARCH, pp. 1–55, 2018, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/52793.