Parallelization of functional flow to predict protein functions

Akkoyun, Emrah
Protein-protein interaction networks provide important information about what the biological function of proteins whose roles are unknown might be in a cell. These interaction networks were analyzed by a variety of approaches by running them on a single computer and the roles of the proteins identified were used to predict the function of the proteins unidentified. The functional flow is an approach that takes the network connectivity, distance effect, topology of the network with local and global views into account. With these advantages, that the functional flow produces more accurate results on the prediction of protein functions was presented by the previos conducted researches. However, the application implemented for this approach could not be practically applied on the large and complex network produced for the complex species because of memory limitation. The purpose of this thesis is to provide a new application be implemented on the high computing performance where the application can be scaled on the large data sets. Therefore, Hadoop, one of the open source map/reduce environments, was installed on 18 hosts each of which has eight cores. Method; the first map/reduce job distributes the protein interaction network as a format which allows parallel distributed computing to all the worker nodes, the other map/reduce job generates flows for each known protein function and the role of the proteins unidentified are predicted by accumulating all of these generated flows. It has been observed in the experiments we performed that the application requiring high performance computing can be decomposed into worker nodes efficiently and the application can provide better performance as the resources increase.


Prediction of protein subcellular localization based on primary sequence data
Özarar, Mert; Atalay, Mehmet Volkan; Department of Computer Engineering (2003)
Subcellular localization is crucial for determining the functions of proteins. A system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order is designed. The approach for prediction is to nd the most frequent motifs for each protein in a given class based on clustering via self organizing maps and then to use these most frequent motifs as features...
Multi-view subcellular localization prediction of human proteins
Özsarı, Gökhan; Atalay, M. Volkan.; Department of Computer Engineering (2019)
Determining the subcellular localization of proteins is crucial for Understanding the functions of proteins, drug targeting, systems biology, and proteomics research. Experimental validation of subcellular localization is an expensive and challenging process. There exist several computational methods for automated prediction of protein subcellular localization; however, there is still room for better performance. Here, we propose a multi-view SVM-based approach that provides predictions for human proteins. ...
Prediction of protein subcellular localization using global protein sequence feature
Bozkurt, Burçin; Atalay, Mehmet Volkan; Department of Computer Engineering (2003)
The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years. Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic p...
Visualizing the protein-protein interactions network in virtual reality and mixed reality environments
Şenderin, Büşra; Sürer, Elif; Tunçbağ, Nurcan; Department of Modeling and Simulation (2021-8)
Protein-protein interactions (PPI) define the physical contact of two or more protein structures. When these interactions are combined, the protein-protein interaction network (PPIN) is formed. The interactions between protein structures are distinct interactions—they happen in specific binding locations on proteins, and they have a specific biological function that they take on. With these networks, the processes within a cell or a living organism when healthy or diseased can be studied. In this thesis, a ...
Architectures and functional coverage of protein-protein interfaces
Tunçbağ, Nurcan; Guney, Emre; NUSSINOV, Ruth; Keskin, Ozlem (2008-09-05)
The diverse range of cellular functions is performed by a limited number of protein folds existing in nature. One may similarly expect that cellular functional diversity would be covered by a limited number of protein-protein interface architectures. Here, we present 8205 interface clusters, each representing a unique interface architecture. This data set of protein-protein interfaces is analyzed and compared with older data sets. We observe that the number of both biological and crystal interfaces increase...
Citation Formats
E. Akkoyun, “Parallelization of functional flow to predict protein functions,” M.S. - Master of Science, Middle East Technical University, 2011.