PREDICTING MULTIPLE TYPES OF BIOLOGICAL RELATIONSHIPS WITH INTEGRATIVE NON-NEGATIVE MATRIX FACTORIZATION

Download
2022-5-09
KARTLI, Onur Savaş
Integrative research on multi-modal biological data is difficult due to their complexity and diverse structure. A critical issue in bioinformatics and computational biology is that many of the associations/relationships between biological components and concepts (i.e., genes, proteins, drugs, diseases, etc.) are still unknown due to the high costs and temporal requirements of wet-lab experiments that uncover them. This thesis aims to predict unknown relationships in biological data by leveraging documented protein-protein, drug-target, gene-disease, and drug-side effect associations. To accomplish this task, first, biological datasets are obtained from UniProt, String, Stitch, Sider, Drugbank, Drugcentral, DisGENET, and KEGG databases, and their relationships are extracted and re-formatted as multiple pairwise relationship matrices. Some of these matrices contain continuous values to be used as association weights. We obtain highly sparse matrices mainly due to the high amount of missing data in biological databases. Second, we predicted missing relationships via integrative matrix factorization, using the non-negative matrix tri-factorization algorithm which is shown to successfully solve similar problems in the literature. For this, a prediction model is trained and evaluated using both classification and regression-based metrics. Subsequently, large-scale prediction of pairwise relationships between proteins, drugs, diseases, and side effects is accomplished using the optimized model. We obtained new predictions for drug-side effect, drug-disease, drug-target protein, and gene/protein-disease interactions. We evaluated the top 250 predictions with the highest scores and validated selected ones from the literature. We hope that the results of this thesis study will help life scientists in planning experimental work by providing preliminary sets of biological associations.

Suggestions

Analyzing the Information Distribution in the fMRI Measurements by Estimating the Degree of Locality
Onal, Itir; Ozay, Mete; Firat, Orhan; GİLLAM, İLKE; Yarman Vural, Fatoş Tunay (2013-07-07)
In this study, we propose a new method for analyzing and representing the distribution of discriminative information for data acquired via functional Magnetic Resonance Imaging (fMRI). For this purpose, we form a spatially local mesh with varying size, around each voxel, called the seed voxel. The relationship among each seed voxel and its neighbors is estimated using a linear regression model by minimizing the square error. Then, we estimate the optimal mesh size that represents the connections among each ...
Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM
Gursoy, Attila; Tunçbağ, Nurcan; NUSSINOV, Ruth; Keskin, Ozlem (2011-09-01)
Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins...
Mathematical Modeling and Approximation of Gene Expression Patterns
Yılmaz, Fatih; Öktem, Hüseyin Avni (2004-09-03)
This study concerns modeling, approximation and inference of gene regulatory dynamics on the basis of gene expression patterns. The dynamical behavior of gene expressions is represented by a system of ordinary differential equations. We introduce a gene-interaction matrix with some nonlinear entries, in particular, quadratic polynomials of the expression levels to keep the system solvable. The model parameters are determined by using optimization. Then, we provide the time-discrete approximation of our time...
Discovering functional interaction patterns in protein-protein interaction networks
Turanalp, Mehmet E.; Can, Tolga (Springer Science and Business Media LLC, 2008-06-11)
Background: In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which represents physical interactions between pairs of proteins of an organism. Major research on PPI networks has focused on understanding the topological organization of PPI networks, evolution...
Comparing Clustering Techniques for Real Microarray Data
Purutçuoğlu Gazi, Vilda (2012-08-29)
The clustering of genes detected as significant or differentially expressed provides useful information to biologists about functions and functional relationship of genes. There are variant types of clustering methods that can be applied in genomic data. These are mainly divided into the two groups, namely, hierarchical and partitional methods. In this paper, as the novelty, we perform a detailed clustering analysis for the recently collected boron microarray dataset to investigate biologically more interes...
Citation Formats
O. S. KARTLI, “PREDICTING MULTIPLE TYPES OF BIOLOGICAL RELATIONSHIPS WITH INTEGRATIVE NON-NEGATIVE MATRIX FACTORIZATION,” M.S. - Master of Science, Middle East Technical University, 2022.