Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees
Date
2006-08-01
Author
Ogul, Hasan
Mumcuoğlu, Ünal Erkan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
20
views
0
downloads
Cite This
A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared with existing discriminative methods SVM-BLAST and SVM-Pairwise, which use BLAST similarity scores and dynamic-programming-based alignment scores, respectively. Results have shown that SVM-PST is more accurate than SVM-BLAST and competitive with SVM-Pairwise. In terms of computational efficiency, PST-based comparison is much better than dynamic-programming-based alignment. We also compared our results with the original family-based PST approach from which we were inspired. The present method provides a significantly better solution for protein classification in comparison with the family-based PST model.
Subject Keywords
Family classification
,
Probabilistic suffix tree
,
Sequence similarity
,
Support vector machine
URI
https://hdl.handle.net/11511/32243
Journal
COMPUTATIONAL BIOLOGY AND CHEMISTRY
DOI
https://doi.org/10.1016/j.compbiolchem.2006.05.001
Collections
Graduate School of Informatics, Article
Suggestions
OpenMETU
Core
TRAINER: A General-Purpose Trainable Short Biosequence Classifer
OĞUL, HASAN; Kalkan, Alper T.; Umu, Sinan U.; Akkaya, Mahinur (2013-10-01)
Classifying sequences is one of the central problems in computational biosciences. Several tools have been released to map an unknown molecular entity to one of the known classes using solely its sequence data. However, all of the existing tools are problem-specific and restricted to an alphabet constrained by relevant biological structure. Here, we introduce TRAINER, a new online tool designed to serve as a generic sequence classification platform to enable users provide their own training data with any al...
FGX: a frequentist gene expression index for Affymetrix arrays
Purutçuoğlu Gazi, Vilda (Oxford University Press (OUP), 2007-04-01)
We consider a new frequentist gene expression index for Affymetrix oligonucleotide DNA arrays, using a similar probe intensity model as suggested by Hein and others (2005), called the Bayesian gene expression index (BGX). According to this model, the perfect match and mismatch values are assumed to be correlated as a result of sharing a common gene expression signal. Rather than a Bayesian approach, we develop a maximum likelihood algorithm for estimating the underlying common signal. In this way, estimatio...
Correlation distribution of a sequence family generalizing some sequences of trachtenberg
Özbudak, Ferruh (2021-08-01)
In this paper, we give a classification of a sequence family, over arbitrary characteristic, adding linear trace terms to the function g(x) = Tr(x(d)), where d = p(2k) - p(k) + 1, first introduced by Trachtenberg. The family has p(n) + 1 cyclically distinct sequences with period p(n) - 1. We compute the exact correlation distribution of the function g(x) with linear m-sequences and amongst themselves. The cross-correlation values are obtained as C-i,C-j(tau) is an element of {-1, -1 +/- p(n+e/2), -1 + p(n)}.
ROMP-polymers, in asymmetric catalysis: The role of the polymer backbone
Bolm, C; Tanyeli, Cihangir; Grenz, A; Dinter, CL (Wiley, 2002-08-01)
Ring-opening metathesis polymerization (ROMP) is utilized for the synthesis of highly functionalized polymers with covalently bound chiral prolinol units. The linear macromolecules act as multifunctional ligands in homogeneous asymmetric catalysis. The solubility of the polymers and their catalytic performance can be tuned by random copolymerization with achiral units in a simple and flexible manner. Use of norbornenes with additional well-defined stereogenic centers in the polymerizable core of the monomer...
Subcellular localization prediction with new protein encoding schemes
Ogul, Hasan; Mumcuoğlu, Ünal Erkan (2007-04-01)
Subcellular localization is one of the key properties in functional annotation of proteins. Support vector machines (SVMs) have been widely used for automated prediction of subcellular localizations. Existing methods differ in the protein encoding schemes used. In this study, we present two methods for protein encoding to be used for SVM-based subcellular localization prediction: n-peptide compositions with reduced amino acid alphabets for larger values of n and pairwise sequence similarity scores based on ...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
H. Ogul and Ü. E. Mumcuoğlu, “SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees,”
COMPUTATIONAL BIOLOGY AND CHEMISTRY
, pp. 292–299, 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/32243.