Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH
Date
2009-06-03
Author
Volkovich, Zeev (Vladimir)
Barzily, Zeev
Weber, Gerhard Wilhelm
Toledano-Kitai, Dvora
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
201
views
0
downloads
Cite This
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.
Subject Keywords
Clustering
,
Cluster Stability
,
Minimal Spanning Tree
,
Two Sample Test
,
Data Mining
URI
https://hdl.handle.net/11511/55487
Conference Name
2nd Global Conference on Power Control and Optimization
Collections
Graduate School of Applied Mathematics, Conference / Seminar
Suggestions
OpenMETU
Core
Cluster stability using minimal spanning trees
Barzily, Zeev; Volkovich, Zeev; Akteke-Oeztuerk, Basak; Weber, Gerhard Wilhelm (2008-05-23)
In this paper, a method for the study of cluster stability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements distribution. It is associated with the clusters cores. The second one, associated with the cluster margins, is related to the low density zones. The samples are clustered and the two obtained partitions are compared. The partitions are considered to be consistent if the obtained ...
An application of the minimal spanning tree approach to the cluster stability problem
Volkovich, Z.; Barzily, Z.; Weber, Gerhard Wilhelm; Toledano-Kitai, D.; Avros, R. (Springer Science and Business Media LLC, 2012-03-01)
Among the areas of data and text mining which are employed today in OR, science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. An important component of clustering theory is determination of the true number of clusters. This problem has not been satisfactorily solved. In our paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters, we estimate the stability of the partitions obtained from clustering of samp...
On a Minimal Spanning, Tree Approach in the Cluster Validation Problem
Barzily, Zeev; Volkovich, Zeev; Öztürk, Başak; Weber, Gerhard Wilhelm (2009-01-01)
In this paper, a method for the study of cluster stability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements distribution. Thus it is associated with the clusters cores. The second one, associated with file cluster margins, is related to the low density zones. The samples are clustered and the two obtained partitions are compared. The partitions are considered to be consistent if the obt...
Clustering of manifold-modeled data based on tangent space variations
Gökdoğan, Gökhan; Vural, Elif; Department of Electrical and Electronics Engineering (2017)
An important research topic of the recent years has been to understand and analyze data collections for clustering and classification applications. In many data analysis problems, the data sets at hand have an intrinsically low-dimensional structure and admit a manifold model. Most state-of-the-art clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate diff...
Pattern extraction by using both spatial and temporal features on Turkish meteorological data
Goler, Işıl; Yazıcı, Adnan; Karagöz, Pınar; Department of Computer Engineering (2010)
With the growth in the size of datasets, data mining has been an important research topic and is receiving substantial interest from both academia and industry for many years. Especially, spatio-temporal data mining, mining knowledge from large amounts of spatio-temporal data, is a highly demanding field because huge amounts of spatio-temporal data are collected in various applications. Therefore, spatio-temporal data mining requires the development of novel data mining algorithms and computational techniqu...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
Z. (. Volkovich, Z. Barzily, G. W. Weber, and D. Toledano-Kitai, “CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH,” Bali, INDONESIA, 2009, vol. 1159, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55487.