Mathematical programming based exact and heuristic solution approaches for a clustering problem with localized feature selection

2024-8-22
Büyük Habacı, Gözdenur
Clustering is an unsupervised machine learning problem that is widely studied in different contexts of business and science. The complexity of real-world data, often characterized by high dimensionality, poses significant challenges to traditional clustering methods. Feature selection is the most used technique to cope with the high dimensionality in clustering problems. Most feature selection methods select a common set of features to define all clusters, which is called global feature selection. Localized feature selection methods consider that the relevant set of features may differ across the clusters and select a set of features for each cluster separately. In this thesis, we address a clustering problem that aims to group data points and select a cluster center and a set of relevant features for each cluster. The objective is to minimize the sum of Euclidean distances between data points and their cluster center over each cluster's relevant set of features. We propose two Mixed-Integer Second-Order Cone Programming formulations, a matheuristic method, and an iterative heuristic method for the problem. We present the computational performance of the proposed methods on generated data sets.
Citation Formats
G. Büyük Habacı, “Mathematical programming based exact and heuristic solution approaches for a clustering problem with localized feature selection,” M.S. - Master of Science, Middle East Technical University, 2024.