Multiobjective evolutionary feature subset selection algorithm for binary classification

Download
2016
Deniz Kızılöz, Firdevsi Ayça
This thesis investigates the performance of multiobjective feature subset selection (FSS) algorithms combined with the state-of-the-art machine learning techniques for binary classification problem. Recent studies try to improve the accuracy of classification by including all of the features in the dataset, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed during the feature evaluation and classification phases, also possibly reducing the accuracy of the results. Therefore, selecting the minimum number of features while preserving the accuracy of the results at a high level becomes an important issue for achieving fast and accurate binary classification. The multiobjective algorithms implemented in this thesis include two phases, selecting feature subsets and applying supervised/unsupervised machine learning techniques to these selected subsets. For the FSS part of the algorithms, first a brute-force approach is implemented. Since exhaustively investigating all of the feature subsets is unfeasible when the number of features is larger than 20, secondly, a greedy algorithm implemented to find good-enough feature subsets. Finally, in order to select the most appropriate feature subsets intelligently, a genetic algorithm is proposed at the FSS part of the algorithms. Crossover and mutation operators are used to improve a population of individuals (each representing a selected feature subset) and obtain (near-)optimal solutions through generations. At the second phase of the algorithms, the performance of the selected feature subsets is evaluated by using five different machine learning techniques: Logistic Regression, Support Vector Machines, Extreme Learning Machine, K-means, and Affinity Propagation. The best performing multiobjective evolutionary algorithm is selected after comprehensive experiments and compared with the state-of-the-art algorithms in literature; Particle Swarm Optimization, Greedy Search, Tabu Search, and Scatter Search. 11 different datasets, mostly obtained from the well-known machine learning data repository of University of California UCI Machine Learning Repository, are used for the performance evaluation of the implemented algorithms. Experimental results show that the classification accuracy increases significantly with the most suitable subset of features and also execution time reduces greatly after applying proposed algorithm on the datasets.

Suggestions

Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques
Deniz, Ayca; Kiziloz, Hakan Ezgi; Dokeroglu, Tansel; Coşar, Ahmet (2017-06-07)
This study investigates the success of a multiobjective genetic algorithm (GA) combined with state-of-the-art machine learning (ML) techniques for the feature subset selection (FSS) in binary classification problem (BCP). Recent studies have focused on improving the accuracy of BCP by including all of the features, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed ...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
Clustering of manifold-modeled data based on tangent space variations
Gökdoğan, Gökhan; Vural, Elif; Department of Electrical and Electronics Engineering (2017)
An important research topic of the recent years has been to understand and analyze data collections for clustering and classification applications. In many data analysis problems, the data sets at hand have an intrinsically low-dimensional structure and admit a manifold model. Most state-of-the-art clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate diff...
Feature weighting problem in k-Nearest neighbor classifier
Güleç, Nurullah; İyigün, Cem; Department of Operational Research (2017)
The k-Nearest Neighbor (k-NN) algorithm is one of the well-known and most common used algorithms for the classification problems. In this study, we have focused on feature weighted k-NN problems. Two different problems are studied. In the first problem, k value and the weights of each feature are optimized to maximize the classification accuracy. Objective function of the problem is nonconvex and nonsmooth. As a solution approach, Forest Optimization Algorithm (FOA), which is a newly introduced evolutionary...
Interactive evolutionary approaches to multi-objective feature selection
Özmen, Müberra; Köksalan, Murat; Karakaya, Gülşah; Department of Industrial Engineering (2016)
In feature selection problems, the aim is to select a subset of features to characterize an output of interest. In characterizing an output, we may want to consider multiple objectives such as maximizing classification performance, minimizing number of selected features or cost, etc. We develop a preference-based approach for multi-objective feature selection problems. Finding all Pareto optimal subsets may turn out to be a computationally demanding problem and we still would need to select a solution event...
Citation Formats
F. A. Deniz Kızılöz, “Multiobjective evolutionary feature subset selection algorithm for binary classification,” M.S. - Master of Science, Middle East Technical University, 2016.