Hide/Show Apps

Tree-structured Data Clustering

Dinler, Derya
Tural, Mustafa Kemal
Özdemirel, Nur Evin
Traditional clustering techniques deal with point data. However, improving measurement capabilities and the need for deeper analyses result in collecting more complex datasets. In this study, we consider a clustering problem in which the data objects are rooted trees with unweighted or weighted edges. Such tree clustering problems arise inmany areas such as biology, neuroscience and computer or social networks. For the solution of the problem, we propose a k-means based algorithm which starts with initial centroid trees and repeats assignment and centroid update steps until convergence. In the assignment step, each data object is assigned to the most similar centroid determined by the Vertex Edge Overlap (VEO). VEO is based on the idea that if two trees share many vertices and edges, then they are similar. In the update step, each centroid is updated by considering the data objects assigned to it. For the unweighted edges case, we propose a Nonlinear Integer Programming formulation to find the centroid of a given cluster which is the tree maximizing the sum of VEOs between trees in the cluster and the centroid itself. We solve the formulation to optimality with a heuristic. For the weighted edges case, we provide a Nonlinear Programming formulation for which we have a heuristic not guaranteeing optimality. We experiment with randomly generated datasets and compare the results with those of the traditional k-modes and k-means algorithms. Preliminary results are promising.