Neighborhood construction-based multi-objective evolutionary clustering algorithm with feature selection

Alakuş, Cansu
In this study, we address the clustering problem with unknown number of clusters having arbitrary shapes, intracluster and/or intercluster density differences, no outliers or noise. The data set may be high-dimensional with a number of redundant features. This study consists of two parts. In the first part, we propose a multi-objective evolutionary clustering algorithm, namely MOCNC, with three fundamental objectives of the clustering problem: compactness, separation, and connectivity. We use the multi-objective framework and nondominated sorting property of the well-known evolutionary algorithm NSGA-II to simultaneously optimize the compactness and separation objectives. To handle the connectivity objective, a special Neighborhood Construction (NC) algorithm is used as a preprocessor. In the second part, we extend the MOCNC algorithm as MOCNC-F for the feature selection problem where the data sets may contain an unknown number of redundant features. In this algorithm, different subsets of features are selected in solutions and clustering is performed using the selected features. The output of MOCNC-F is a set of nondominated clustering solutions each with different compactness and separation values, and possibly with different feature subsets. Our algorithms are unique in that they solve the feature selection and clustering problem simultaneously using the three fundamental objectives, which are compactness, separation, and connectivity, explicitly. The proposed algorithms do not need any user-defined problem parameters. We have experimented with the algorithms on generated and benchmark data sets, and obtained promising results based on selected performance criteria.