Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach

2016-06-01
Karakaya, Gülşah
AHİPAŞAOĞLU, Selin Damla
TAORMİNA, Riccardo
An emerging trend in feature selection is the development of two-objective algorithms that analyze the tradeoff between the number of features and the classification performance of the model built with these features. Since these two objectives are conflicting, a typical result stands in a set of Pareto-efficient subsets, each having a different cardinality and a corresponding discriminating power. However, this approach overlooks the fact that, for a given cardinality, there can be several subsets with similar information content. The study reported here addresses this problem, and introduces a novel multiobjective feature selection approach conceived to identify: 1) a subset that maximizes the performance of a given classifier and 2) a set of subsets that are quasi equally informative, i.e., have almost same classification performance, to the performance maximizing subset. The approach consists of a wrapper [Wrapper for Quasi Equally Informative Subset Selection (W-QEISS)] built on the formulation of a four-objective optimization problem, which is aimed at maximizing the accuracy of a classifier, minimizing the number of features, and optimizing two entropy-based measures of relevance and redundancy. This allows conducting the search in a larger space, thus enabling the wrapper to generate a large number of Pareto-efficient solutions. The algorithm is compared against the mRMR algorithm, a two-objective wrapper and a computationally efficient filter [Filter for Quasi Equally Informative Subset Selection (F-QEISS)] on 24 University of California, Irvine, (UCI) datasets including both binary and multiclass classification. Experimental results show that W-QEISS has the capability of evolving a rich and diverse set of Pareto-efficient solutions, and that their availability helps in: 1) studying the tradeoff between multiple measures of classification performance and 2) understanding the relative importance of each feature. The quasi equally informative subsets are identified at the cost of a marginal increase in the computational time thanks to the adoption of Borg Multiobjective Evolutionary Algorithm and Extreme Learning Machine as global optimization and learning algorithms, respectively.
IEEE TRANSACTIONS ON CYBERNETICS

Suggestions

Analysis of Multiobjective Algorithms for the Classification of Multi-Label Video Datasets
Karagoz, Gizem Nur; Yazıcı, Adnan; Dokeroglu, Tansel; Cosar, Ahmet (Institute of Electrical and Electronics Engineers (IEEE), 2020)
It is of great importance to extract and validate an optimal subset of non-dominated features for effective multi-label classification. However, deciding on the best subset of features is an NP-Hard problem and plays a key role in improving the prediction accuracy and the processing time of video datasets. In this study, we propose autoencoders for dimensionality reduction of video data sets and ensemble the features extracted by the multi-objective evolutionary Non-dominated Sorting Genetic Algorithm and t...
Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques
Deniz, Ayca; Kiziloz, Hakan Ezgi; Dokeroglu, Tansel; Coşar, Ahmet (2017-06-07)
This study investigates the success of a multiobjective genetic algorithm (GA) combined with state-of-the-art machine learning (ML) techniques for the feature subset selection (FSS) in binary classification problem (BCP). Recent studies have focused on improving the accuracy of BCP by including all of the features, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed ...
Identifying preferred solutions in multiobjective combinatorial optimization problems
Lokman, Banu (2019-01-01)
We develop an evolutionary algorithm for multiobjective combinatorial optimization problems. The algorithm aims at converging the preferred solutions of a decision-maker. We test the performance of the algorithm on the multiobjective knapsack and multiobjective spanning tree problems. We generate the true nondominated solutions using an exact algorithm and compare the results with those of the evolutionary algorithm. We observe that the evolutionary algorithm works well in approximating the solutions in the...
Representing temporal knowledge in connectionist expert systems
Alpaslan, Ferda Nur (1996-09-27)
This paper introduces a new temporal neural networks model which can be used in connectionist expert systems. Also, a Variation of backpropagation algorithm, called the temporal feedforward backpropagation algorithm is introduced as a method for training the neural network. The algorithm was tested using training examples extracted from a medical expert system. A series of experiments were carried out using the temporal model and the temporal backpropagation algorithm. The experiments indicated that the alg...
Optimization of physical parameters of an underactuated quadrupedal robot
Karagoz, Osman Kaan; Ankaralı, Mustafa Mert (2018-01-01)
In this paper, we present the comparison of different optimization algorithms that are used to optimize the parameters of a simulated legged robotic platform. We compare the results obtained by applying different algorithms on the same model and show the relative advantages and disadvantages of these algorithms. The tested algorithms are Particle Swarm Optimization, Binary Coded Genetic Algorithm, Broyden-Fletcher-Goldfrab-Shannon Algorithm and Method of Zoutendijk. We showed that the globally optimal param...
Citation Formats
G. Karakaya, S. D. AHİPAŞAOĞLU, and R. TAORMİNA, “Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach,” IEEE TRANSACTIONS ON CYBERNETICS, pp. 1424–1437, 2016, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/46278.