Identification of functionally orthologous protein groups in different species based on protein network alignment

Yaveroğlu, Ömer Nebil
In this study, an algorithm named ClustOrth is proposed for determining and matching functionally orthologous protein clusters in different species. The algorithm requires protein interaction networks of the organisms to be compared and GO terms of the proteins in these interaction networks as prior information. After determining the functionally related protein groups using the Repeated Random Walks algorithm, the method maps the identified protein groups according to the similarity metric defined. In order to evaluate the similarities of protein groups, graph theoretical information is used together with the context information about the proteins. The clusters are aligned using GO-Term-based protein similarity measures defined in previous studies. These alignments are used to evaluate cluster similarities by defining a cluster similarity metric from protein similarities. The top scoring cluster alignments are considered as orthologous. Several data sources providing orthology information have shown that the defined cluster similarity metric can be used to make inferences about the orthological relevance of protein groups. Comparison with a protein orthology prediction algorithm named ISORANK also showed that the ClustOrth algorithm is successful in determining orthologies between proteins. However, the cluster similarity metric is too strict and many cluster matches are not able to produce high scores for this metric. For this reason, the number of predictions performed is low. This problem can be overcomed with the introduction of different sources of information related to proteins in the clusters for the evaluation of the clusters. The ClustOrth algorithm also outperformed the NetworkBLAST algorithm which aims to find orthologous protein clusters using protein sequence information directly for determining orthologies. It can be concluded that this study is one of the leading studies addressing the protein cluster matching problem for identifying orthologous functional modules of protein interaction networks computationally.


Computational representation of protein sequences for homology detection and classification
Oğul, Hasan; Mumcuoğlu, Ünal Erkan; Department of Information Systems (2006)
Machine learning techniques have been widely used for classification problems in computational biology. They require that the input must be a collection of fixedlength feature vectors. Since proteins are of varying lengths, there is a need for a means of representing protein sequences by a fixed-number of features. This thesis introduces three novel methods for this purpose: n-peptide compositions with reduced alphabets, pairwise similarity scores by maximal unique matches, and pairwise similarity scores by...
A pattern classification approach boosted with genetic algorithms
Yalabık, İsmet; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2007)
Ensemble learning is a multiple-classier machine learning approach which combines, produces collections and ensembles statistical classiers to build up more accurate classier than the individual classiers. Bagging, boosting and voting methods are the basic examples of ensemble learning. In this thesis, a novel boosting technique targeting to solve partial problems of AdaBoost, a well-known boosting algorithm, is proposed. The proposed systems nd an elegant way of boosting a bunch of classiers successively t...
Modeling of various biological networks via LCMARS
AYYILDIZ DEMİRCİ, EZGİ; Purutçuoğlu Gazi, Vilda (Elsevier BV, 2018-09-01)
In system biology, the interactions between components such as genes, proteins, can be represented by a network. To understand the molecular mechanism of complex biological systems, construction of their networks plays a crucial role. However, estimation of these biological networks is a challenging problem because of their high dimensional and sparse structures. Several statistical methods are proposed to overcome this issue. The Conic Multivariate Adaptive Regression Splines (CMARS) is one of the recent n...
An image encryption algorithm robust to post-encryption bitrate conversion
Akdağ, Sadık Bahaettin; Candan, Çağatay; Department of Electrical and Electronics Engineering (2006)
In this study, a new method is proposed to protect JPEG still images through encryption by employing integer-to-integer transforms and frequency domain scrambling in DCT channels. Different from existing methods in the literature, the encrypted image can be further compressed, i.e. transcoded, after the encryption. The method provides selective encryption/security level with the adjustment of its parameters. The encryption method is tested with various images and compared with the methods in the literature ...
A clustering method for the problem of protein subcellular localization
Bezek, Perit; Atalay, Mehmet Volkan; Department of Computer Engineering (2006)
In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein’s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common; that is, occurrence (frequency) of common subsequences should be high. Our ...
Citation Formats
Ö. N. Yaveroğlu, “Identification of functionally orthologous protein groups in different species based on protein network alignment,” M.S. - Master of Science, Middle East Technical University, 2010.