Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Dimension reduction for tree-structured data
Download
Dimension Reduction for Tree-Structured Data.pdf
Date
2021-9
Author
Durak, Erdinç
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
342
views
122
downloads
Cite This
Statistical analysis of tree-structured data is one of the exciting research areas with expanding application areas. In classical data analysis, the data objects are points in the Euclidean space, whereas they are trees in the analysis of tree-structured data. The replacement of points in the Euclidean space with trees brings in additional complexity in data analysis and necessitates the use of dimension reduction techniques. In this study, we aim to develop dimension reduction techniques for tree-structured data where each tree is rooted and labeled. We consider two classical dimension reduction techniques; namely, the principal component analysis (PCA) and multidimensional scaling (MDS), and adapt them to tree-structured data. Unlike a previous study on the PCA for tree-structured data, the PCA techniques proposed in this thesis maximize a measure of variance. Computational experiments on randomly generated data and real life data show the superiority of the proposed PCA techniques over the existing one. In the literature, there is no study that performs MDS on tree-structured data to project them in the tree space. In this direction, we propose the first MDS method for tree-structured data, where the edges of the trees are considered as the dimensions. In the proposed MDS method, the aim is to keep the Hamming distances between pairs of trees proportionally similar. For this purpose, we propose a mixed-integer linear programming model which finds the edges to be kept in an optimal way and heuristic methods where the edges are greedily selected one by one. Computational experiments show that the proposed MDS methods are successful in keeping useful information as high clustering accuracy is achieved with only a fraction of the edges. To be able to perform the computational experiments in a systematic way, a random tree generator algorithm is developed. This algorithm is able to generate clusters of trees with different skewness and density parameters. By changing these parameters systematically, we are able to understand the strength and weaknesses of the proposed methods.
Subject Keywords
Tree-structured data
,
Dimension reduction
,
Statistical analysis on graphs
,
Principal component analysis
,
Multidimensional scaling
URI
https://hdl.handle.net/11511/93057
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
Mask Combination of Multi-Layer Graphs for Global Structure Inference
Bayram, Eda; Thanou, Dorina; Vural, Elif; Frossard, Pascal (2020-01-01)
Structure inference is an important task for network data processing and analysis in data science. In recent years, quite a few approaches have been developed to learn the graph structure underlying a set of observations captured in a data space. Although real-world data is often acquired in settings where relationships are influenced by a priori known rules, such domain knowledge is still not well exploited in structure inference problems. In this paper, we identify the structure of signals defined in a da...
Dimension reduction using global and local pattern information-based maximum margin criterion
Sakarya, Ufuk (2016-07-01)
Dimension reduction is an important research area in pattern recognition when dealing with high-dimensional data. In this paper, a novel supervised dimension reduction approach is introduced for classification. Advantages of using not only global pattern information but also local pattern information are examined in the maximum margin criterion framework. Experimental comparative results in object recognition, handwritten digit recognition, and hyperspectral image classification are presented. According to ...
PROGRESSIVE CLUSTERING OF MANIFOLD-MODELED DATA BASED ON TANGENT SPACE VARIATIONS
Gokdogan, Gokhan; Vural, Elif (2017-09-28)
An important research topic of the recent years has been to understand and analyze manifold-modeled data for clustering and classification applications. Most clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate difficult sampling conditions only to some extent, and may fail for scarcely sampled data manifolds or at high-curvature regions. In this paper, w...
Uncertainty models for vector based functional curves and assessing the reliability of G-Band
Kurtar, Ahmet Kürşat; Düzgün, H. Şebnem; Department of Geodetic and Geographical Information Technologies (2006)
This study is about uncertainty medelling for vector features in geographic information systems (GIS). It has mainly two objectives which are about the band models used for uncertainty modelling . The first one is the assessment of accuracy of GBand model, which is the latest and the most complex uncertainty handling model for vector features. Some simulations and tests are applied to test the reliability of accuracy of G-Band with comparing Chrisman’s epsilon band model, which is the most frequently used b...
Integration of geophysical - geological data using geographic information systems
Şirinyıldız, Tunç; Toprak, Vedat; Department of Geodetic and Geographical Information Technologies (2003)
This study attempts to integrate geophysical data with other spatial data using Geographic Information Systems (GIS). The study is carried out in a part of Galatean Volcanic Province, north of Ankara. Gravity, magnetic, topographic, rock type and volcanic eruption center data are the data layers used in the study. All data layers are converted to raster format with a grid spacing of 100 m. The first step in the analysis is the pair-wise analyses of all data layers. For the geophysical data, different layers...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
E. Durak, “Dimension reduction for tree-structured data,” M.S. - Master of Science, Middle East Technical University, 2021.