A Multi-objective approach to cluster ensemble selection problem

Aktaş, Dilay
Clustering is an unsupervised learning method that partitions a data set into groups. The aim is to assign similar points to the same cluster and dissimilar points to different clusters with respect to some notion of similarity. It is applicable to a wide range of areas such as recommender systems, anomaly detection, market research, and customer segmentation. With the advances in the computational power, a diverse set of clustering solutions can be obtained from a dataset using different clustering algorithms, different parameter settings and different features. Clustering ensemble has emerged as a powerful tool for combining the strengths of these multiple clustering solutions and generating a consensus solution. It improves the quality of clustering in terms of accuracy and robustness. In this study, we address the cluster ensemble selection problem, and propose a multi-objective approach to generate a consensus clustering solution. Our proposed algorithm selects a representative subset of clustering solutions, and produces a consensus clustering solution by combining these representatives. Different from the existing approaches, we design the representative selection approach based on three criteria: quality, diversity, and size of the representative set. Before the representative selection, we apply a preprocessing procedure to analyze the characteristics of the clustering solutions in the library and eliminate the ones that may mislead the consensus function. We test the performance of the proposed approach on the benchmark datasets. The results show that the proposed approach works well, and the resulting consensus solution is better than the clustering solutions in the library.