Feature Selection for Graph Kernels

Date

2010-12-21

Author

TAN, MEHMET
Polat, Faruk
Alhajj, Reda

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

139
views

0
downloads

Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.

Subject Keywords

Feature Selection, Classification, Graph Kernels, Bioinformatics, Cheminformatics

URI

https://hdl.handle.net/11511/55641

Conference Name

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Collections

Department of Computer Engineering, Conference / Seminar

Suggestions

OpenMETU
Core

A new framework of multi-objective evolutionary algorithms for feature selection and multi-label classification of video data Karagoz, Gizem Nur; Yazıcı, Adnan; Dokeroglu, Tansel; Coşar, Ahmet (2020-06-01) There are few studies in the literature to address the multi-objective multi-label feature selection for the classification of video data using evolutionary algorithms. Selecting the most appropriate subset of features is a significant problem while maintaining/improving the accuracy of the prediction results. This study proposes a framework of parallel multi-objective Non-dominated Sorting Genetic Algorithms (NSGA-II) for exploring a Pareto set of non-dominated solutions. The subsets of non-dominated featu...
Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques Deniz, Ayca; Kiziloz, Hakan Ezgi; Dokeroglu, Tansel; Coşar, Ahmet (2017-06-07) This study investigates the success of a multiobjective genetic algorithm (GA) combined with state-of-the-art machine learning (ML) techniques for the feature subset selection (FSS) in binary classification problem (BCP). Recent studies have focused on improving the accuracy of BCP by including all of the features, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed ...
Learning Graph Signal Representations with Narrowband Spectral Kernels Kar, Osman Furkan; Turhan, Gülce; Vural, Elif (2022-01-01) In this work, we study the problem of learning graph dictionary models from partially observed graph signals. We represent graph signals in terms of atoms generated by narrowband graph kernels. We formulate an optimization problem where the kernel parameters are learnt jointly with the signal representations under a triple regularization scheme: While the first regularization term aims to control the spectrum of the narrowband kernels, the second term encourages the reconstructed graph signals to vary smoot...
Transformations of Data in Deterministic Modelling of Biological Networks Agraz, Melih; Purutçuoğlu Gazi, Vilda (2015-05-21) The Gaussian graphical model (GGM) is a probabilistic modelling approach used in the system biology to represent the relationship between genes with an undirected graph. In graphical models, the genes and their interactions are denoted by nodes and the edges between nodes. Hereby, in this model, it is assumed that the structure of the system can be described by the inverse of the covariance matrix, Theta, which is also called as the precision, when the observations are formulated via a lasso regression unde...
Estimation of Time-Varying Graph Signals by Learning Graph Dictionaries Zamanda Deǧişen Graf Sinyallerinin Kestirimi için Graflarda Sözlük Öǧrenme Acar, Abdullah Burak; Vural, Elif (2022-01-01) We study the problem of estimating time-varying graph signals from missing observations. We propose a method based on learning graph dictionaries specified by a set of time-vertex kernels in the joint spectral domain. The parameters of the time-vertex kernels are optimized jointly with the sparse representation coefficients of the signals, so that the learnt representation fits well to the available observations of the time-vertex signals at hand. The missing observations of the signals are then estimated b...

Citation Formats

M. TAN, F. Polat, and R. Alhajj, “Feature Selection for Graph Kernels,” presented at the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Hong Kong, HONG KONG, 2010, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55641.