PREDICTING MULTIPLE TYPES OF BIOLOGICAL RELATIONSHIPS WITH INTEGRATIVE NON-NEGATIVE MATRIX FACTORIZATION

Download
2022-5-09
KARTLI, Onur Savaş
Integrative research on multi-modal biological data is difficult due to their complexity and diverse structure. A critical issue in bioinformatics and computational biology is that many of the associations/relationships between biological components and concepts (i.e., genes, proteins, drugs, diseases, etc.) are still unknown due to the high costs and temporal requirements of wet-lab experiments that uncover them. This thesis aims to predict unknown relationships in biological data by leveraging documented protein-protein, drug-target, gene-disease, and drug-side effect associations. To accomplish this task, first, biological datasets are obtained from UniProt, String, Stitch, Sider, Drugbank, Drugcentral, DisGENET, and KEGG databases, and their relationships are extracted and re-formatted as multiple pairwise relationship matrices. Some of these matrices contain continuous values to be used as association weights. We obtain highly sparse matrices mainly due to the high amount of missing data in biological databases. Second, we predicted missing relationships via integrative matrix factorization, using the non-negative matrix tri-factorization algorithm which is shown to successfully solve similar problems in the literature. For this, a prediction model is trained and evaluated using both classification and regression-based metrics. Subsequently, large-scale prediction of pairwise relationships between proteins, drugs, diseases, and side effects is accomplished using the optimized model. We obtained new predictions for drug-side effect, drug-disease, drug-target protein, and gene/protein-disease interactions. We evaluated the top 250 predictions with the highest scores and validated selected ones from the literature. We hope that the results of this thesis study will help life scientists in planning experimental work by providing preliminary sets of biological associations.

Suggestions

Analyzing the Information Distribution in the fMRI Measurements by Estimating the Degree of Locality
Onal, Itir; Ozay, Mete; Firat, Orhan; GİLLAM, İLKE; Yarman Vural, Fatoş Tunay (2013-07-07)
In this study, we propose a new method for analyzing and representing the distribution of discriminative information for data acquired via functional Magnetic Resonance Imaging (fMRI). For this purpose, we form a spatially local mesh with varying size, around each voxel, called the seed voxel. The relationship among each seed voxel and its neighbors is estimated using a linear regression model by minimizing the square error. Then, we estimate the optimal mesh size that represents the connections among each ...
Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM
Gursoy, Attila; Tunçbağ, Nurcan; NUSSINOV, Ruth; Keskin, Ozlem (2011-09-01)
Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins...
Discovering functional interaction patterns in protein-protein interaction networks
Turanalp, Mehmet E.; Can, Tolga (Springer Science and Business Media LLC, 2008-06-11)
Background: In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which represents physical interactions between pairs of proteins of an organism. Major research on PPI networks has focused on understanding the topological organization of PPI networks, evolution...
Selection of representative SNP sets for genome-wide association studies: a metaheuristic approach
Ustunkar, Gurkan; AKYÜZ, SÜREYYA; Weber, Gerhard W.; Friedrich, Christoph M.; Aydın Son, Yeşim (2012-08-01)
After the completion of Human Genome Project in 2003, it is now possible to associate genetic variations in the human genome with common and complex diseases. The current challenge now is to utilize the genomic data efficiently and to develop tools to improve our understanding of etiology of complex diseases. Many of the algorithms needed to deal with this task were originally developed in management science and operations research (OR). One application is to select a subset of the Single Nucleotide Polymor...
Combining Multiple Types of Biological Data in Constraint-Based Learning of Gene Regulatory Networks
Tan, Mehmet; AlShalalfa, Mohammed; Alhajj, Reda; Polat, Faruk (2008-09-17)
Due to the complex structure and scale of gene regulatory networks, we support the argument that combination of multiple types of biological data to derive satisfactory network structures is necessary to understand the regulatory mechanisms of cellular systems. In this paper, we propose a simple but effective method of combining two types of biological data, namely microarray and transcription factor (TF) binding data, to construct gene regulatory networks. The proposed algorithm is based on and extends the...
Citation Formats
O. S. KARTLI, “PREDICTING MULTIPLE TYPES OF BIOLOGICAL RELATIONSHIPS WITH INTEGRATIVE NON-NEGATIVE MATRIX FACTORIZATION,” M.S. - Master of Science, Middle East Technical University, 2022.