Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Distance matrices as protein representations
Download
MehmetDinc_TezSon.pdf
Date
2022-9-02
Author
Dinç, Mehmet
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
204
views
300
downloads
Cite This
Representing protein sequences is a crucial problem in the field of bioinformatics since any data-driven model's performance is limited by the information contained in its input features. A protein's biological function is dictated by its structure and knowing a protein's structure can potentially help predict its interactions with drug candidates or predict its Gene Ontology (GO) term. Yet, off-the-shelf protein representations do not contain such information since only a small fraction of the billions of known protein sequences have experimentally determined structures, as the cost of running such experiments is quite high. A newly introduced neural network-based structure prediction model, AlphaFold, claims to be able to predict protein structures with high accuracy. In this study, two-dimensional distance matrices generated from AlphaFold structure predictions are used as input features while modeling two different bioinformatics problems; drug-target interaction (DTI) prediction and Gene Ontology term prediction. For the DTI prediction problem, a state-of-the-art model which already uses two-dimensional protein features, is employed as a baseline. Then, the effect of distance matrices is observed through ablation studies. Moreover, the same model is adapted in order to tackle the GO prediction problem and its success is compared with off-the-shelf protein representations.
Subject Keywords
Protein distance matrices
,
Drug-target interaction prediction
,
Gene ontology prediction
URI
https://hdl.handle.net/11511/98770
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
Distance-based discretization of parametric signal manifolds
Vural, Elif (2010-06-28)
The characterization of signals and images in manifolds often lead to efficient dimensionality reduction algorithms based on manifold distance computation for analysis or classification tasks. We propose in this paper a method for the discretization of signal manifolds given in a parametric form. We present an iterative algorithm for the selection of samples on the manifold that permits to minimize the average error in the manifold distance computation. Experimental results with image appearance manifolds d...
Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants
Rifaioğlu, Ahmet Süreyya; Sarac, Omer Sinan; ERSAHİN, Tulin; Saidi, Rabie; Atalay, Mehmet Volkan; Atalay, Rengül (2018-02-01)
Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred co...
Short Time Series Microarray Data Analysis and Biological Annotation
Sökmen, Zerrin; Atalay, Mehmet Volkan; Atalay, Rengül (2008-01-01)
Significant gene list is the result of microarray data analysis should be explained for the purpose of biological functions. The aim of this study is to extract the biologically related gene clusters over the short time series microarray gene data by applying unsupervised methods and automatically perform biological annotation of those clusters. In the first step of the study, short time series microarray expression data is clustered according to similar expression profiles. After that, several biological d...
Deep learning for prediction of drug-target interaction space and protein functions
Rifaioğlu, Ahmet Süreyya; Atalay, Mehmet Volkan; Department of Computer Engineering (2020)
With the advancement of sequencing and high-throughput screening technologies, large amount of sequence and compound data have been accumulated in biological and chemical databases. However, only small number of proteins and compounds have been annotated by wet-lab experiments due to the huge compound and chemical space. Therefore, computational methods have been developed to annotate protein and compound space. In this thesis, we describe the design and implementation of several methods for accurate drug-t...
Multi-task Deep Neural Networks in Protein Function Prediction
Rifaioğlu, Ahmet Süreyya; Doğan, Tunca; Martin, Maria Jesus; Atalay, Rengül; Atalay, Mehmet Volkan (2017-05-01)
In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
M. Dinç, “Distance matrices as protein representations,” M.S. - Master of Science, Middle East Technical University, 2022.