Identification of functionally orthologous protein groups in different species based on protein network alignment

Download
2010
Yaveroğlu, Ömer Nebil
In this study, an algorithm named ClustOrth is proposed for determining and matching functionally orthologous protein clusters in different species. The algorithm requires protein interaction networks of the organisms to be compared and GO terms of the proteins in these interaction networks as prior information. After determining the functionally related protein groups using the Repeated Random Walks algorithm, the method maps the identified protein groups according to the similarity metric defined. In order to evaluate the similarities of protein groups, graph theoretical information is used together with the context information about the proteins. The clusters are aligned using GO-Term-based protein similarity measures defined in previous studies. These alignments are used to evaluate cluster similarities by defining a cluster similarity metric from protein similarities. The top scoring cluster alignments are considered as orthologous. Several data sources providing orthology information have shown that the defined cluster similarity metric can be used to make inferences about the orthological relevance of protein groups. Comparison with a protein orthology prediction algorithm named ISORANK also showed that the ClustOrth algorithm is successful in determining orthologies between proteins. However, the cluster similarity metric is too strict and many cluster matches are not able to produce high scores for this metric. For this reason, the number of predictions performed is low. This problem can be overcomed with the introduction of different sources of information related to proteins in the clusters for the evaluation of the clusters. The ClustOrth algorithm also outperformed the NetworkBLAST algorithm which aims to find orthologous protein clusters using protein sequence information directly for determining orthologies. It can be concluded that this study is one of the leading studies addressing the protein cluster matching problem for identifying orthologous functional modules of protein interaction networks computationally.

Suggestions

A clustering method for the problem of protein subcellular localization
Bezek, Perit; Atalay, Mehmet Volkan; Department of Computer Engineering (2006)
In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein’s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common; that is, occurrence (frequency) of common subsequences should be high. Our ...
A classification system for the problem of protein subcellular localization
Alay, Gökçen; Atalay, Mehmet Volkan; Department of Computer Engineering (2007)
The focus of this study is on predicting the subcellular localization of a protein. Subcellular localization information is important for protein function annotation which is a fundamental problem in computational biology. For this problem, a classification system is built that has two main parts: a predictor that is based on a feature mapping technique to extract biologically meaningful information from protein sequences and a client/server architecture for searching and predicting subcellular localization...
Comparison of rough multi layer perceptron and rough radial basis function networks using fuzzy attributes
Vural, Hülya; Alpaslan, Ferda Nur; Department of Computer Engineering (2004)
The hybridization of soft computing methods of Radial Basis Function (RBF) neural networks, Multi Layer Perceptron (MLP) neural networks with back-propagation learning, fuzzy sets and rough sets are studied in the scope of this thesis. Conventional MLP, conventional RBF, fuzzy MLP, fuzzy RBF, rough fuzzy MLP, and rough fuzzy RBF networks are compared. In the fuzzy neural networks implemented in this thesis, the input data and the desired outputs are given fuzzy membership values as the fuzzy properties أlow...
Computational representation of protein sequences for homology detection and classification
Oğul, Hasan; Mumcuoğlu, Ünal Erkan; Department of Information Systems (2006)
Machine learning techniques have been widely used for classification problems in computational biology. They require that the input must be a collection of fixedlength feature vectors. Since proteins are of varying lengths, there is a need for a means of representing protein sequences by a fixed-number of features. This thesis introduces three novel methods for this purpose: n-peptide compositions with reduced alphabets, pairwise similarity scores by maximal unique matches, and pairwise similarity scores by...
Modeling of various biological networks via LCMARS
AYYILDIZ DEMİRCİ, EZGİ; Purutçuoğlu Gazi, Vilda (Elsevier BV, 2018-09-01)
In system biology, the interactions between components such as genes, proteins, can be represented by a network. To understand the molecular mechanism of complex biological systems, construction of their networks plays a crucial role. However, estimation of these biological networks is a challenging problem because of their high dimensional and sparse structures. Several statistical methods are proposed to overcome this issue. The Conic Multivariate Adaptive Regression Splines (CMARS) is one of the recent n...
Citation Formats
Ö. N. Yaveroğlu, “Identification of functionally orthologous protein groups in different species based on protein network alignment,” M.S. - Master of Science, Middle East Technical University, 2010.