A Multi-objective approach to cluster ensemble selection problem

Aktaş, Dilay
Clustering is an unsupervised learning method that partitions a data set into groups. The aim is to assign similar points to the same cluster and dissimilar points to different clusters with respect to some notion of similarity. It is applicable to a wide range of areas such as recommender systems, anomaly detection, market research, and customer segmentation. With the advances in the computational power, a diverse set of clustering solutions can be obtained from a dataset using different clustering algorithms, different parameter settings and different features. Clustering ensemble has emerged as a powerful tool for combining the strengths of these multiple clustering solutions and generating a consensus solution. It improves the quality of clustering in terms of accuracy and robustness. In this study, we address the cluster ensemble selection problem, and propose a multi-objective approach to generate a consensus clustering solution. Our proposed algorithm selects a representative subset of clustering solutions, and produces a consensus clustering solution by combining these representatives. Different from the existing approaches, we design the representative selection approach based on three criteria: quality, diversity, and size of the representative set. Before the representative selection, we apply a preprocessing procedure to analyze the characteristics of the clustering solutions in the library and eliminate the ones that may mislead the consensus function. We test the performance of the proposed approach on the benchmark datasets. The results show that the proposed approach works well, and the resulting consensus solution is better than the clustering solutions in the library.


K-median clustering algorithms for time series data
Gökçem, Yiğit; İyigün, Cem; Department of Industrial Engineering (2021-3-10)
Clustering is an unsupervised learning method, that groups the unlabeled data forgathering valuable information. Clustering can be applied on various types of data. Inthis study, we have focused on time series clustering. When the studies about timeseries clustering are reviewed in the literature, for the time series data, the centers ofthe formed clusters are selected from the existing time series samples in the clusters.In this study, we have changed that view and have proposed clustering algorithmsbased...
A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning
Vural, Elif (2018-01-01)
Supervised manifold learning methods learn data representations by preserving the geometric structure of data while enhancing the separation between data samples from different classes. In this work, we propose a theoretical study of supervised manifold learning for classification. We consider nonlinear dimensionality reduction algorithms that yield linearly separable embeddings of training data and present generalization bounds for this type of algorithms. A necessary condition for satisfactory generalizat...
A Formal Methods Approach to Pattern Recognition and Synthesis in Reaction Diffusion Networks
Bartocci, Ezio; Aydın Göl, Ebru; Haghighi, Iman; Belta, Calin (2018-03-01)
We introduce a formal framework for specifying, detecting, and generating spatial patterns in reaction diffusion networks. Our approach is based on a novel spatial superposition logic, whose semantics is defined over the quad-tree representation of a partitioned image. We demonstrate how to use rule-based classifiers to efficiently learn spatial superposition logic formulas for several types of patterns from positive and negative examples. We implement pattern detection as a model-checking algorithm and we ...
Analysis of Face Recognition Algorithms for Online and Automatic Annotation of Personal Videos
Yılmaztürk, Mehmet; Ulusoy Parnas, İlkay; Çiçekli, Fehime Nihan (Springer, Dordrecht; 2010-05-08)
Different from previous automatic but offline annotation systems, this paper studies automatic and online face annotation for personal videos/episodes of TV series considering Nearest Neighbourhood, LDA and SVM classification with Local Binary Patterns, Discrete Cosine Transform and Histogram of Oriented Gradients feature extraction methods in terms of their recognition accuracies and execution times. The best performing feature extraction method and the classifier pair is found out to be SVM classification...
A binomial noised model for cluster validation
Toledano-Kitai, Dvora; Avros, Renata; Volkovich, Zeev; Weber, Gerhard Wilhelm; Yahalom, Orly (IOS Press, 2013-01-01)
Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal ("true") number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implemen...
Citation Formats
D. Aktaş, “A Multi-objective approach to cluster ensemble selection problem,” Thesis (M.S.) -- Graduate School of Informatics. Operational Research., Middle East Technical University, 2019.