Discriminative remote homology detection using maximal unique sequence matches

Date

2005-01-01

Author

OGUL, H
Mumcuoğlu, Ünal Erkan

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

151
views

0
downloads

We define a new pairwise sequence comparison scheme, for distantly related proteins and report its performance on remote homology detection task. The new scheme compares two protein sequences by using the maximal unique matches (MUM) between them. Once identified, the length of all nonoverlapping MUMs is used to define the simflarity between two sequences. To detect the homology of a protein to a protein family, we utilize the feature vectors containing all pairwise similarity scores between the test protein and the proteins in the training set. Support vector machines are employed for the binary classification in the same way that the recent works have done. The new method is shown to be more accurate than the recent methods including SVM-Fisher and SVM-BLAST, and competitive with SVM-Pairwise. In terms of computational efficiency, the new method performs much better than SVM-Pairwise.

Subject Keywords

Protein, Database

URI

https://hdl.handle.net/11511/54565

Journal

ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS

Collections

Graduate School of Informatics, Article

Suggestions

OpenMETU
Core

Using fuzzy Petri nets for static analysis of rule-bases Bostan-Korpeoglu, B; Yazıcı, Adnan (2004-01-01) We use a Fuzzy Petri Net (FPN) structure to represent knowledge and model the behavior in our intelligent object-oriented database environment, which integrates fuzzy, active and deductive rules with database objects. However, the behavior of a system can be unpredictable due to the rules triggering or untriggering each other (non-termination). Intermediate and final database states may also differ according to the order of rule executions (non-confluence). In order to foresee and solve problematic behavior...
An attempt to classify Turkish district data : K-Means and Self-Organizing Map (SOM) algorithms Aksoy, Ece; Işık, Oğuz; Department of Geodetic and Geographical Information Technologies (2004) There is no universally applicable clustering technique in discovering the variety of structures display in data sets. Also, a single algorithm or approach is not adequate to solve every clustering problem. There are many methods available, the criteria used differ and hence different classifications may be obtained for the same data. While larger and larger amounts of data are collected and stored in databases, there is increasing the need for efficient and effective analysis methods. Grouping or classific...
Prediction of protein subcellular localization based on primary sequence data Özarar, Mert; Atalay, Mehmet Volkan; Department of Computer Engineering (2003) Subcellular localization is crucial for determining the functions of proteins. A system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order is designed. The approach for prediction is to nd the most frequent motifs for each protein in a given class based on clustering via self organizing maps and then to use these most frequent motifs as features...
A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets OĞUL, Hasan; Mumcuoğlu, Ünal Erkan (2007-01-01) In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP famil...
Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants Rifaioğlu, Ahmet Süreyya; Sarac, Omer Sinan; ERSAHİN, Tulin; Saidi, Rabie; Atalay, Mehmet Volkan; Atalay, Rengül (2018-02-01) Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred co...

Citation Formats

H. OGUL and Ü. E. Mumcuoğlu, “Discriminative remote homology detection using maximal unique sequence matches,” ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, pp. 283–292, 2005, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/54565.