A Multi-objective approach to cluster ensemble selection problem

Download
2019
Aktaş, Dilay
Clustering is an unsupervised learning method that partitions a data set into groups. The aim is to assign similar points to the same cluster and dissimilar points to different clusters with respect to some notion of similarity. It is applicable to a wide range of areas such as recommender systems, anomaly detection, market research, and customer segmentation. With the advances in the computational power, a diverse set of clustering solutions can be obtained from a dataset using different clustering algorithms, different parameter settings and different features. Clustering ensemble has emerged as a powerful tool for combining the strengths of these multiple clustering solutions and generating a consensus solution. It improves the quality of clustering in terms of accuracy and robustness. In this study, we address the cluster ensemble selection problem, and propose a multi-objective approach to generate a consensus clustering solution. Our proposed algorithm selects a representative subset of clustering solutions, and produces a consensus clustering solution by combining these representatives. Different from the existing approaches, we design the representative selection approach based on three criteria: quality, diversity, and size of the representative set. Before the representative selection, we apply a preprocessing procedure to analyze the characteristics of the clustering solutions in the library and eliminate the ones that may mislead the consensus function. We test the performance of the proposed approach on the benchmark datasets. The results show that the proposed approach works well, and the resulting consensus solution is better than the clustering solutions in the library.

Suggestions

A memetic algorithm for clustering with cluster based feature selection
Şener, İlyas Alper; İyigün, Cem; Department of Operational Research (2022-8)
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of fe...
K-median clustering algorithms for time series data
Gökçem, Yiğit; İyigün, Cem; Department of Industrial Engineering (2021-3-10)
Clustering is an unsupervised learning method, that groups the unlabeled data forgathering valuable information. Clustering can be applied on various types of data. Inthis study, we have focused on time series clustering. When the studies about timeseries clustering are reviewed in the literature, for the time series data, the centers ofthe formed clusters are selected from the existing time series samples in the clusters.In this study, we have changed that view and have proposed clustering algorithmsbased...
A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning
Vural, Elif (2018-01-01)
Supervised manifold learning methods learn data representations by preserving the geometric structure of data while enhancing the separation between data samples from different classes. In this work, we propose a theoretical study of supervised manifold learning for classification. We consider nonlinear dimensionality reduction algorithms that yield linearly separable embeddings of training data and present generalization bounds for this type of algorithms. A necessary condition for satisfactory generalizat...
A Bayesian Approach to Learning Scoring Systems
Ertekin Bolelli, Şeyda (2015-12-01)
We present a Bayesian method for building scoring systems, which are linear models with coefficients that have very few significant digits. Usually the construction of scoring systems involve manual efforthumans invent the full scoring system without using data, or they choose how logistic regression coefficients should be scaled and rounded to produce a scoring system. These kinds of heuristics lead to suboptimal solutions. Our approach is different in that humans need only specify the prior over what the ...
A Formal Methods Approach to Pattern Recognition and Synthesis in Reaction Diffusion Networks
Bartocci, Ezio; Aydın Göl, Ebru; Haghighi, Iman; Belta, Calin (2018-03-01)
We introduce a formal framework for specifying, detecting, and generating spatial patterns in reaction diffusion networks. Our approach is based on a novel spatial superposition logic, whose semantics is defined over the quad-tree representation of a partitioned image. We demonstrate how to use rule-based classifiers to efficiently learn spatial superposition logic formulas for several types of patterns from positive and negative examples. We implement pattern detection as a model-checking algorithm and we ...
Citation Formats
D. Aktaş, “A Multi-objective approach to cluster ensemble selection problem,” Thesis (M.S.) -- Graduate School of Informatics. Operational Research., Middle East Technical University, 2019.