Discriminative remote homology detection using maximal unique sequence matches

We define a new pairwise sequence comparison scheme, for distantly related proteins and report its performance on remote homology detection task. The new scheme compares two protein sequences by using the maximal unique matches (MUM) between them. Once identified, the length of all nonoverlapping MUMs is used to define the simflarity between two sequences. To detect the homology of a protein to a protein family, we utilize the feature vectors containing all pairwise similarity scores between the test protein and the proteins in the training set. Support vector machines are employed for the binary classification in the same way that the recent works have done. The new method is shown to be more accurate than the recent methods including SVM-Fisher and SVM-BLAST, and competitive with SVM-Pairwise. In terms of computational efficiency, the new method performs much better than SVM-Pairwise.


Using fuzzy Petri nets for static analysis of rule-bases
Bostan-Korpeoglu, B; Yazıcı, Adnan (2004-01-01)
We use a Fuzzy Petri Net (FPN) structure to represent knowledge and model the behavior in our intelligent object-oriented database environment, which integrates fuzzy, active and deductive rules with database objects. However, the behavior of a system can be unpredictable due to the rules triggering or untriggering each other (non-termination). Intermediate and final database states may also differ according to the order of rule executions (non-confluence). In order to foresee and solve problematic behavior...
Prediction of protein subcellular localization using global protein sequence feature
Bozkurt, Burçin; Atalay, Mehmet Volkan; Department of Computer Engineering (2003)
The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years. Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic p...
An attempt to classify Turkish district data : K-Means and Self-Organizing Map (SOM) algorithms
Aksoy, Ece; Işık, Oğuz; Department of Geodetic and Geographical Information Technologies (2004)
There is no universally applicable clustering technique in discovering the variety of structures display in data sets. Also, a single algorithm or approach is not adequate to solve every clustering problem. There are many methods available, the criteria used differ and hence different classifications may be obtained for the same data. While larger and larger amounts of data are collected and stored in databases, there is increasing the need for efficient and effective analysis methods. Grouping or classific...
Prediction of protein subcellular localization based on primary sequence data
Özarar, Mert; Atalay, Mehmet Volkan; Department of Computer Engineering (2003)
Subcellular localization is crucial for determining the functions of proteins. A system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order is designed. The approach for prediction is to nd the most frequent motifs for each protein in a given class based on clustering via self organizing maps and then to use these most frequent motifs as features...
A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets
OĞUL, Hasan; Mumcuoğlu, Ünal Erkan (2007-01-01)
In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP famil...
Citation Formats
H. OGUL and Ü. E. Mumcuoğlu, “Discriminative remote homology detection using maximal unique sequence matches,” ADVANCES IN INTELLIGENT DATA ANALYSIS VI, PROCEEDINGS, pp. 283–292, 2005, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/54565.