A hybrid swarm intelligence algorithm for simultaneous feature selection and clustering

Geren, Hasan
In this study, we address the feature selection and clustering problems by using a hybrid swarm intelligence approach. We assume that the number of clusters is known, clusters can be of any shape and have different densities, but there are no outliers or noise. The data set may have high dimensionality and redundant features. We propose a swarm intelligence algorithm, namely ACOVNS, which is a hybridization of Ant Colony Optimization (ACO) and Variable Neighborhood Search (VNS). We utilize the ACO mechanisms for exploration and enhance its exploitation capability by combining it with VNS. In addition to pheromone values, we make use of some heuristic information to further improve the performance of the algorithm. In the first part of our study, we use our algorithm with an objective function based on the sum of Euclidean distances to solve the clustering problem. In the second part, we modify the ACOVNS algorithm as F-ACOVNS to perform feature selection and clustering simultaneously. We propose a novel heuristic information that employs the Laplacian Score (LS) and a second pheromone matrix for feature selection. Therefore, the algorithm selects features during clustering by using distinct pheromone matrices and heuristic information. Our proposed algorithms are unique in that ACOVNS is the first hybridization of ACO and VNS for clustering and F-ACOVNS is the first algorithm that uses LS as heuristic information. We compared the performance of ACOVNS with some well-known algorithms on nine real-world data sets. For simultaneous feature selection and clustering, we compared F-ACOVNS with known single and multi-objective algorithms using both real and synthetic data sets.


A new framework of multi-objective evolutionary algorithms for feature selection and multi-label classification of video data
Karagoz, Gizem Nur; Yazıcı, Adnan; Dokeroglu, Tansel; Coşar, Ahmet (2020-06-01)
There are few studies in the literature to address the multi-objective multi-label feature selection for the classification of video data using evolutionary algorithms. Selecting the most appropriate subset of features is a significant problem while maintaining/improving the accuracy of the prediction results. This study proposes a framework of parallel multi-objective Non-dominated Sorting Genetic Algorithms (NSGA-II) for exploring a Pareto set of non-dominated solutions. The subsets of non-dominated featu...
A memetic algorithm for clustering with cluster based feature selection
Şener, İlyas Alper; İyigün, Cem; Department of Operational Research (2022-8)
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of fe...
An intelligent multimedia information system for multimodal content extraction and querying
Yazıcı, Adnan; Yilmaz, Turgay; Sattari, Saeid; SERT, MUSTAFA; Gulen, Elvan (2018-01-01)
This paper introduces an intelligent multimedia information system, which exploits machine learning and database technologies. The system extracts semantic contents of videos automatically by using the visual, auditory and textual modalities, then, stores the extracted contents in an appropriate format to retrieve them efficiently in subsequent requests for information. The semantic contents are extracted from these three modalities of data separately. Afterwards, the outputs from these modalities are fused...
A matheuristic for binary classification of data sets using hyperboxes
Akbulut, Derya; İyigün, Cem; Özdemirel, Nur Evin (null; 2018-07-08)
In this study, an optimization approach is proposed for the binary classification problem. A Mixed Integer Programming (MIP) model formulation is used to construct hyperboxes as classifiers, minimizing the number of misclassified and unclassified samples as well as overlapping of hyperboxes. The hyperboxes are determined by some lower and upper bounds on the feature values, and overlapping of these hyperboxes is allowed to keep a balance between misclassification and overfitting. A matheuristic, namely Iter...
A multitasking knowledge-based system for control applications
Tolun, Mehmet; Baykal, Nazife; Abu-Shaar, S. (1999)
Knowledge-based systems can provide several intelligent features for control applications, which decrease their dependency on human operators. As industrial systems become more complex, the response time and the amount of thinking required to control a large number of instruments far surpass the capability of humans. This paper describes a knowledge-based tool architecture that is supported by a multitasking inference engine and an interfacing hardware for data acquisition. The tool Features a high-level...
Citation Formats
H. Geren, “A hybrid swarm intelligence algorithm for simultaneous feature selection and clustering,” M.S. - Master of Science, Middle East Technical University, 2022.