Multiobjective evolutionary feature subset selection algorithm for binary classification

Download

index.pdf

Date

2016

Author

Deniz Kızılöz, Firdevsi Ayça

Metadata

Show full item record

Item Usage Stats

287
views

139
downloads

This thesis investigates the performance of multiobjective feature subset selection (FSS) algorithms combined with the state-of-the-art machine learning techniques for binary classification problem. Recent studies try to improve the accuracy of classification by including all of the features in the dataset, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed during the feature evaluation and classification phases, also possibly reducing the accuracy of the results. Therefore, selecting the minimum number of features while preserving the accuracy of the results at a high level becomes an important issue for achieving fast and accurate binary classification. The multiobjective algorithms implemented in this thesis include two phases, selecting feature subsets and applying supervised/unsupervised machine learning techniques to these selected subsets. For the FSS part of the algorithms, first a brute-force approach is implemented. Since exhaustively investigating all of the feature subsets is unfeasible when the number of features is larger than 20, secondly, a greedy algorithm implemented to find good-enough feature subsets. Finally, in order to select the most appropriate feature subsets intelligently, a genetic algorithm is proposed at the FSS part of the algorithms. Crossover and mutation operators are used to improve a population of individuals (each representing a selected feature subset) and obtain (near-)optimal solutions through generations. At the second phase of the algorithms, the performance of the selected feature subsets is evaluated by using five different machine learning techniques: Logistic Regression, Support Vector Machines, Extreme Learning Machine, K-means, and Affinity Propagation. The best performing multiobjective evolutionary algorithm is selected after comprehensive experiments and compared with the state-of-the-art algorithms in literature; Particle Swarm Optimization, Greedy Search, Tabu Search, and Scatter Search. 11 different datasets, mostly obtained from the well-known machine learning data repository of University of California UCI Machine Learning Repository, are used for the performance evaluation of the implemented algorithms. Experimental results show that the classification accuracy increases significantly with the most suitable subset of features and also execution time reduces greatly after applying proposed algorithm on the datasets.

Subject Keywords

Machine learning., Genetic algorithms., Computer algorithms., Evolutionary computation.

URI

http://etd.lib.metu.edu.tr/upload/12620272/index.pdf
https://hdl.handle.net/11511/25876

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques Deniz, Ayca; Kiziloz, Hakan Ezgi; Dokeroglu, Tansel; Coşar, Ahmet (2017-06-07) This study investigates the success of a multiobjective genetic algorithm (GA) combined with state-of-the-art machine learning (ML) techniques for the feature subset selection (FSS) in binary classification problem (BCP). Recent studies have focused on improving the accuracy of BCP by including all of the features, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed ...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03) In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
Feature weighting problem in k-Nearest neighbor classifier Güleç, Nurullah; İyigün, Cem; Department of Operational Research (2017) The k-Nearest Neighbor (k-NN) algorithm is one of the well-known and most common used algorithms for the classification problems. In this study, we have focused on feature weighted k-NN problems. Two different problems are studied. In the first problem, k value and the weights of each feature are optimized to maximize the classification accuracy. Objective function of the problem is nonconvex and nonsmooth. As a solution approach, Forest Optimization Algorithm (FOA), which is a newly introduced evolutionary...
GMDH2: Binary Classification via GMDH-Type Neural Network Algorithms-R Package and Web-Based Tool DAĞ, OSMAN; KARABULUT, ERDEM; Alpar, Reha (Atlantis Press, 2019-01-01) Group method of data handling (GMDH)-type neural network algorithms are the self-organizing algorithms for modeling complex systems. GMDH algorithms are used for different objectives; examples include regression, classification, clustering, forecasting, and so on. In this paper, we present GMDH2 package to perform binary classification via GMDH-type neural network algorithms. The package offers two main algorithms: GMDH algorithm and diverse classifiers ensemble based on GMDH (dce-GMDH) algorithm. GMDH algo...
Computational representation of protein sequences for homology detection and classification Oğul, Hasan; Mumcuoğlu, Ünal Erkan; Department of Information Systems (2006) Machine learning techniques have been widely used for classification problems in computational biology. They require that the input must be a collection of fixedlength feature vectors. Since proteins are of varying lengths, there is a need for a means of representing protein sequences by a fixed-number of features. This thesis introduces three novel methods for this purpose: n-peptide compositions with reduced alphabets, pairwise similarity scores by maximal unique matches, and pairwise similarity scores by...

Citation Formats

F. A. Deniz Kızılöz, “Multiobjective evolutionary feature subset selection algorithm for binary classification,” M.S. - Master of Science, Middle East Technical University, 2016.