SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees

A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared with existing discriminative methods SVM-BLAST and SVM-Pairwise, which use BLAST similarity scores and dynamic-programming-based alignment scores, respectively. Results have shown that SVM-PST is more accurate than SVM-BLAST and competitive with SVM-Pairwise. In terms of computational efficiency, PST-based comparison is much better than dynamic-programming-based alignment. We also compared our results with the original family-based PST approach from which we were inspired. The present method provides a significantly better solution for protein classification in comparison with the family-based PST model.


TRAINER: A General-Purpose Trainable Short Biosequence Classifer
OĞUL, HASAN; Kalkan, Alper T.; Umu, Sinan U.; Akkaya, Mahinur (2013-10-01)
Classifying sequences is one of the central problems in computational biosciences. Several tools have been released to map an unknown molecular entity to one of the known classes using solely its sequence data. However, all of the existing tools are problem-specific and restricted to an alphabet constrained by relevant biological structure. Here, we introduce TRAINER, a new online tool designed to serve as a generic sequence classification platform to enable users provide their own training data with any al...
FGX: a frequentist gene expression index for Affymetrix arrays
Purutçuoğlu Gazi, Vilda (Oxford University Press (OUP), 2007-04-01)
We consider a new frequentist gene expression index for Affymetrix oligonucleotide DNA arrays, using a similar probe intensity model as suggested by Hein and others (2005), called the Bayesian gene expression index (BGX). According to this model, the perfect match and mismatch values are assumed to be correlated as a result of sharing a common gene expression signal. Rather than a Bayesian approach, we develop a maximum likelihood algorithm for estimating the underlying common signal. In this way, estimatio...
Correlation distribution of a sequence family generalizing some sequences of trachtenberg
Özbudak, Ferruh (2021-08-01)
In this paper, we give a classification of a sequence family, over arbitrary characteristic, adding linear trace terms to the function g(x) = Tr(x(d)), where d = p(2k) - p(k) + 1, first introduced by Trachtenberg. The family has p(n) + 1 cyclically distinct sequences with period p(n) - 1. We compute the exact correlation distribution of the function g(x) with linear m-sequences and amongst themselves. The cross-correlation values are obtained as C-i,C-j(tau) is an element of {-1, -1 +/- p(n+e/2), -1 + p(n)}.
ROMP-polymers, in asymmetric catalysis: The role of the polymer backbone
Bolm, C; Tanyeli, Cihangir; Grenz, A; Dinter, CL (Wiley, 2002-08-01)
Ring-opening metathesis polymerization (ROMP) is utilized for the synthesis of highly functionalized polymers with covalently bound chiral prolinol units. The linear macromolecules act as multifunctional ligands in homogeneous asymmetric catalysis. The solubility of the polymers and their catalytic performance can be tuned by random copolymerization with achiral units in a simple and flexible manner. Use of norbornenes with additional well-defined stereogenic centers in the polymerizable core of the monomer...
Subcellular localization prediction with new protein encoding schemes
Ogul, Hasan; Mumcuoğlu, Ünal Erkan (2007-04-01)
Subcellular localization is one of the key properties in functional annotation of proteins. Support vector machines (SVMs) have been widely used for automated prediction of subcellular localizations. Existing methods differ in the protein encoding schemes used. In this study, we present two methods for protein encoding to be used for SVM-based subcellular localization prediction: n-peptide compositions with reduced amino acid alphabets for larger values of n and pairwise sequence similarity scores based on ...
Citation Formats
H. Ogul and Ü. E. Mumcuoğlu, “SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees,” COMPUTATIONAL BIOLOGY AND CHEMISTRY, pp. 292–299, 2006, Accessed: 00, 2020. [Online]. Available: