SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees

Date

2006-08-01

Author

Ogul, Hasan
Mumcuoğlu, Ünal Erkan

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

155
views

0
downloads

A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared with existing discriminative methods SVM-BLAST and SVM-Pairwise, which use BLAST similarity scores and dynamic-programming-based alignment scores, respectively. Results have shown that SVM-PST is more accurate than SVM-BLAST and competitive with SVM-Pairwise. In terms of computational efficiency, PST-based comparison is much better than dynamic-programming-based alignment. We also compared our results with the original family-based PST approach from which we were inspired. The present method provides a significantly better solution for protein classification in comparison with the family-based PST model.

Subject Keywords

Family classification, Probabilistic suffix tree, Sequence similarity, Support vector machine

URI

https://hdl.handle.net/11511/32243

Journal

COMPUTATIONAL BIOLOGY AND CHEMISTRY

DOI

https://doi.org/10.1016/j.compbiolchem.2006.05.001

Collections

Graduate School of Informatics, Article

Suggestions

OpenMETU
Core

TRAINER: A General-Purpose Trainable Short Biosequence Classifer OĞUL, HASAN; Kalkan, Alper T.; Umu, Sinan U.; Akkaya, Mahinur (2013-10-01) Classifying sequences is one of the central problems in computational biosciences. Several tools have been released to map an unknown molecular entity to one of the known classes using solely its sequence data. However, all of the existing tools are problem-specific and restricted to an alphabet constrained by relevant biological structure. Here, we introduce TRAINER, a new online tool designed to serve as a generic sequence classification platform to enable users provide their own training data with any al...
FGX: a frequentist gene expression index for Affymetrix arrays Purutçuoğlu Gazi, Vilda (Oxford University Press (OUP), 2007-04-01) We consider a new frequentist gene expression index for Affymetrix oligonucleotide DNA arrays, using a similar probe intensity model as suggested by Hein and others (2005), called the Bayesian gene expression index (BGX). According to this model, the perfect match and mismatch values are assumed to be correlated as a result of sharing a common gene expression signal. Rather than a Bayesian approach, we develop a maximum likelihood algorithm for estimating the underlying common signal. In this way, estimatio...
ROMP-polymers, in asymmetric catalysis: The role of the polymer backbone Bolm, C; Tanyeli, Cihangir; Grenz, A; Dinter, CL (Wiley, 2002-08-01) Ring-opening metathesis polymerization (ROMP) is utilized for the synthesis of highly functionalized polymers with covalently bound chiral prolinol units. The linear macromolecules act as multifunctional ligands in homogeneous asymmetric catalysis. The solubility of the polymers and their catalytic performance can be tuned by random copolymerization with achiral units in a simple and flexible manner. Use of norbornenes with additional well-defined stereogenic centers in the polymerizable core of the monomer...
EXACTLY SOLVABLE EFFECTIVE MASS D-DIMENSIONAL SCHRODINGER EQUATION FOR PSEUDOHARMONIC AND MODIFIED KRATZER PROBLEMS IKHDAİR, SAMEER; Sever, Ramazan (World Scientific Pub Co Pte Lt, 2009-03-01) The point canonical transformation (PCT) approach is used to solve the Schrodinger equation for an arbitrary dimension D with a power-law position-dependent effective mass (PDEM) distribution function for the pseudoharmonic and modified Kratzer (Mie-type) diatomic molecular potentials. In mapping the transformed exactly solvable D-dimensional (D >= 2) Schrodinger equation with constant mass into the effective mass equation by using a proper transformation, the exact bound state solutions including the energ...
FTIR studies of vitamin E-cholesterol-DPPC membrane interactions in CH2 region Severcan, Feride; Baykal, U; Suzer, S (1996-06-01) Binary and ternary mixtures of alpha-tocopherol (alpha T), cholesterol and dipalmitoyl phosphatidylcholine (DPPC) in the form of multilamellar liposomes have been investigated by Fourier Transform Infrared Spectroscopy (FTIR). Investigation of frequencies, bandwidths and band shapes of CH2 stretching and scissoring bands indicate that the effect of alpha T is dominant in comparison with cholesterol and alpha T decreases the interaction of cholesterol with phospholipid membranes.

Citation Formats

H. Ogul and Ü. E. Mumcuoğlu, “SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees,” COMPUTATIONAL BIOLOGY AND CHEMISTRY, pp. 292–299, 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/32243.