Cluster validity analysis using subsampling

2013-09-02
ABUL, OSMAN
Lo, A
Alhajj, Reda
Polat, Faruk
Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to handle both purposes. In this paper, we propose three methods for estimating confidence in the validity of clustering result. The first method validates clustering result by employing supervised classifiers. The dataset is divided into training and test sets and the accuracy of the classifier is evaluated on the test set. This method computes confidence in the generalization capability of clustering. The second method is based on the fact that if a clustering is valid then each of its subsets should be valid as well. The third method is similar to second method; it takes the dual approach, i.e., each cluster is expected to be stable and compact. Confidence is estimated by repeating the process a number of times on subsamples. Experimental results illustrate effectiveness of the proposed methods.

Suggestions

Network dimensioning in randomly deployed wireless sensor networks
Sevgi, Cüneyt; Koçyiğit, Altan; Department of Information Systems (2009)
In this study, we considered a heterogeneous, clustered WSN, which consists of two types of nodes (clusterheads and sensor nodes) deployed randomly over a sensing field. We investigated two cases based on how clusterheads can reach the sink: direct and multi-hop communication cases. Network dimensioning problems in randomly deployed WSNs are among the most challenging ones as the attributes of these networks are mostly non-deterministic. We focused on a number of network dimensioning problems based on the c...
Neighborhood construction-based multi-objective evolutionary clustering algorithm with feature selection
Alakuş, Cansu; Özdemirel, Nur Evin; İyigün, Cem; Department of Operational Research (2018)
In this study, we address the clustering problem with unknown number of clusters having arbitrary shapes, intracluster and/or intercluster density differences, no outliers or noise. The data set may be high-dimensional with a number of redundant features. This study consists of two parts. In the first part, we propose a multi-objective evolutionary clustering algorithm, namely MOCNC, with three fundamental objectives of the clustering problem: compactness, separation, and connectivity. We use the multi-obje...
Estimation of Noise Model Parameters for Images Taken by a Full-frame Hyperspectral Camera
DEMİRKESEN, Can; Leloğlu, Uğur Murat (2015-09-23)
Noise has to be taken into account in the algorithms of classification, target detection and anomaly detection. Recent studies indicate that noise estimation is also crucial in subspace identification of Hyper Spectral Images (his). Several techniques were proposed for noise estimation including: multiple linear regression based techniques, spectral unmixing and remixing etc. The noise in HSI is widely accepted to be a spatially stationary random process. But the variance of the noise varies from one wavele...
Mixed integer programming and heuristics approaches for clustering with cluster-based feature selection
İyigün, Cem (null; 2019-10-20)
In this study, we work on a clustering problem where it is assumed that the features identifying the clusters may differ for each cluster. Number of clusters and number of relevant features in each cluster are given in advance. A centerbased clustering approach is proposed. Finding the cluster centers, assigning the data points and selecting relevant features for each cluster are performed simultaneously. A non-linear mixed integer mathematical model is proposed which minimizes the total distance between da...
Covariance Matrix Estimation of Texture Correlated Compound-Gaussian Vectors for Adaptive Radar Detection
Candan, Çağatay; Pascal, Frederic (2022-01-01)
IEEECovariance matrix estimation of compound-Gaussian vectors with texture-correlation (spatial correlation for the adaptive radar detectors) is examined. The texture parameters are treated as hidden random parameters whose statistical description is given by a Markov chain. States of the chain represent the value of texture coefficient and the transition probabilities establish the correlation in the texture sequence. An Expectation-Maximization (EM) method based covariance matrix estimation solution is gi...
Citation Formats
O. ABUL, A. Lo, R. Alhajj, and F. Polat, “Cluster validity analysis using subsampling,” 2013, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/68886.