Subcellular localization prediction with new protein encoding schemes

Date

2007-04-01

Author

Ogul, Hasan
Mumcuoğlu, Ünal Erkan

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

153
views

0
downloads

Subcellular localization is one of the key properties in functional annotation of proteins. Support vector machines (SVMs) have been widely used for automated prediction of subcellular localizations. Existing methods differ in the protein encoding schemes used. In this study, we present two methods for protein encoding to be used for SVM-based subcellular localization prediction: n-peptide compositions with reduced amino acid alphabets for larger values of n and pairwise sequence similarity scores based on whole sequence and N-terminal sequence. We tested the methods on a common benchmarking data set that consists of 2,427 eukaryotic proteins with four localization sites. As a result of 5-fold cross-validation tests, the encoding with n-peptide compositions provided the accuracies of 84.5, 88.9, 66.3, and 94.3 percent for cytoplasmic, extracellular, mitochondrial, and nuclear proteins, where the overall accuracy was 87.1 percent. The second method provided 83.6, 87.7, 87.9, and 90.5 percent accuracies for individual locations and 87.8 percent overall accuracy. A hybrid system, which we called PredLOC, makes a final decision based on the results of the two presented methods which achieved an overall accuracy of 91.3 percent, which is better than the achievements of many of the existing methods. The new system also outperformed the recent methods in the experiments conducted on a new-unique SWISSPROT test set.

Subject Keywords

N-peptide composition, probabilistic suffix tree, subcellular localization, support vector machines

URI

https://hdl.handle.net/11511/30953

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

DOI

https://doi.org/10.1109/tcbb.2007.070209

Collections

Graduate School of Informatics, Article

Suggestions

OpenMETU
Core

SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees Ogul, Hasan; Mumcuoğlu, Ünal Erkan (2006-08-01) A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared ...
A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets OĞUL, Hasan; Mumcuoğlu, Ünal Erkan (2007-01-01) In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP famil...
DUDMap: 3D RGB-D mapping for dense, unstructured, and dynamic environment Hastürk, Özgür; Erkmen, Aydan Müşerref (2021-01-01) Simultaneous localization and mapping (SLAM) problem has been extensively studied by researchers in the field of robotics, however, conventional approaches in mapping assume a static environment. The static assumption is valid only in a small region, and it limits the application of visual SLAM in dynamic environments. The recently proposed state-of-the-art SLAM solutions for dynamic environments use different semantic segmentation methods such as mask R-CNN and SegNet; however, these frameworks are based o...
Discriminative remote homology detection using maximal unique sequence matches OGUL, H; Mumcuoğlu, Ünal Erkan (2005-01-01) We define a new pairwise sequence comparison scheme, for distantly related proteins and report its performance on remote homology detection task. The new scheme compares two protein sequences by using the maximal unique matches (MUM) between them. Once identified, the length of all nonoverlapping MUMs is used to define the simflarity between two sequences. To detect the homology of a protein to a protein family, we utilize the feature vectors containing all pairwise similarity scores between the test protei...
Multi-view subcellular localization prediction of human proteins Özsarı, Gökhan; Atalay, M. Volkan.; Department of Computer Engineering (2019) Determining the subcellular localization of proteins is crucial for Understanding the functions of proteins, drug targeting, systems biology, and proteomics research. Experimental validation of subcellular localization is an expensive and challenging process. There exist several computational methods for automated prediction of protein subcellular localization; however, there is still room for better performance. Here, we propose a multi-view SVM-based approach that provides predictions for human proteins. ...

Citation Formats

H. Ogul and Ü. E. Mumcuoğlu, “Subcellular localization prediction with new protein encoding schemes,” IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, pp. 227–232, 2007, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/30953.