Tree-structured Data Clustering

2018-07-08
Traditional clustering techniques deal with point data. However, improving measurement capabilities and the need for deeper analyses result in collecting more complex datasets. In this study, we consider a clustering problem in which the data objects are rooted trees with unweighted or weighted edges. Such tree clustering problems arise inmany areas such as biology, neuroscience and computer or social networks. For the solution of the problem, we propose a k-means based algorithm which starts with initial centroid trees and repeats assignment and centroid update steps until convergence. In the assignment step, each data object is assigned to the most similar centroid determined by the Vertex Edge Overlap (VEO). VEO is based on the idea that if two trees share many vertices and edges, then they are similar. In the update step, each centroid is updated by considering the data objects assigned to it. For the unweighted edges case, we propose a Nonlinear Integer Programming formulation to find the centroid of a given cluster which is the tree maximizing the sum of VEOs between trees in the cluster and the centroid itself. We solve the formulation to optimality with a heuristic. For the weighted edges case, we provide a Nonlinear Programming formulation for which we have a heuristic not guaranteeing optimality. We experiment with randomly generated datasets and compare the results with those of the traditional k-modes and k-means algorithms. Preliminary results are promising.

Suggestions

Tree-structured Data Clustering
Dinler, Derya; Tural, Mustafa Kemal; Özdemirel, Nur Evin (null; 2018-07-08)
Traditional clustering techniques deal with point data. However, improving measurement capabilities and the need for deeper analyses result in collecting more complex datasets. In this study, we consider a clustering problem in which the data objects are rooted trees with unweighted or weighted edges. Such tree clustering problems arise inmany areas such as biology, neuroscience and computer or social networks. For the solution of the problem, we propose a k-means based algorithm which starts with initial c...
Investigation of Stationarity for Graph Time Series Data Sets
Güneyi, Eylem Tuğçe; Vural, Elif (2021-01-07)
Graphs permit the analysis of the relationships in complex data sets effectively. Stationarity is a feature that facilitates the analysis and processing of random time signals. Since graphs have an irregular structure, the definition of classical stationarity does not apply to graphs. In this study, we study how stationarity is defined for graph random processes and examine the validity of the stationarity assumption with experiments on synthetic and real data sets.
Semantic information-based alternative plan generation for multiple query optimization
Polat, Faruk; Alhajj, R (Elsevier BV, 2001-09-01)
This paper addresses the impact of semantic information about queries on alternative plan generation (APG) for multiple query optimization (MQO). MQO covers optimizing the execution of a set of queries together where each query in the set to be optimized has several alternative execution plans. A multiple query optimizer selects an alternative plan for each query to obtain an optimal global execution plan. Our approach uses information such as common relations, common possible joins and common conditions to...
Image Annotation With Semi-Supervised Clustering
Sayar, Ahmet; Yarman Vural, Fatoş Tunay (2009-09-16)
Methods developed for image annotation usually make use of region clustering algorithms. Visual codebooks are generated from the region clusters of low level features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this paper, we supervise the clustering process by using three types of side information. The first one is the topic probability information obtained from the text document associated with the image. The second is the orientation an...
Domain-Structured Chaos in a Hopfield Neural Network
Akhmet, Marat (World Scientific Pub Co Pte Lt, 2019-12-30)
In this paper, we provide a new method for constructing chaotic Hopfield neural networks. Our approach is based on structuring the domain to form a special set through the discrete evolution of the network state variables. In the chaotic regime, the formed set is invariant under the system governing the dynamics of the neural network. The approach can be viewed as an extension of the unimodality technique for one-dimensional map, thereby generating chaos from higher-dimensional systems. We show that the dis...
Citation Formats
D. Dinler, M. K. Tural, and N. E. Özdemirel, “Tree-structured Data Clustering,” 2018, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/70575.