Deep learning for prediction of drug-target interaction space and protein functions

2020
Rifaioğlu, Ahmet Süreyya
With the advancement of sequencing and high-throughput screening technologies, large amount of sequence and compound data have been accumulated in biological and chemical databases. However, only small number of proteins and compounds have been annotated by wet-lab experiments due to the huge compound and chemical space. Therefore, computational methods have been developed to annotate protein and compound space. In this thesis, we describe the design and implementation of several methods for accurate drug-target interaction prediction and functional annotations of proteins within the framework of Comprehensive Resource of Biomedical Relations with Deep Learning and Network Representations (CROssBAR) project whose aim is to integrate biological and chemical data scattered in different sources and to create prediction methods for drug discovery based on deep learning. The first method, DEEPred is a sequence based automated protein function prediction method that employs a stacked multi-task deep neural networks based on Gene Ontology (GO) directed acyclic graph hierarchy. The performance of DEEPred was compared with state-of-the-art methods and its source code is available at https://github.com/cansyl/deepred. DEEPScreen is the second method and it is a drug-target interaction (binary) prediction method. In DEEPScreen, the idea is to learn compound features automatically using compound images via convolutional neural networks. DEEPScreen was trained for 704 target proteins and the input compounds predicted as active or inactive against trained targets. The performance of DEEPScreen was compared with the state-of-the art methods using different benchmarking datasets. The source code is available at https://github.com/cansyl/DEEPScreen. The third method is called MDeePred which is a binding affinity prediction method. MDeePred is a chemogenomic method where both protein and compounds features were fed to a hybrid pairwise deep neural network structure. The main difference between MDeePred and DEEPScreen in terms of features is that MDeePred employs compound-target feature pairs whereas in DEEPScreen only compound features were used. The main novelty of MDeePred is the proposed multi-channel featurization approach for protein sequences where each channel represents a different property of input protein sequences. The performance of MDeePred was calculated on multiple benchmarking datasets and compared its performance with the state-of-the-art methods. The source code for MDeePred is available at https://github.com/cansyl/MDeePred. The fourth method is called iBioProVis which is an online interactive visualization tool for chemical space. The main purpose of iBioProVis is to embed and visualize compound features on 2-D space. It relies on the assumption that topologically and chemically similar compounds have similar bioactivity profiles. The inputs for iBioProVis are target protein identifiers and optionally, SMILES strings of user-input compounds. The tool then generates circular fingerprints for active compounds of targets and user-input compounds and then, t-Stochastic Neighbor Embedding (t-SNE) method is used to embed compounds on 2-D space. The tool also provides cross-references for well-known databases for input targets and compounds. iBioProVis is available at https://ibioprovis.kansil.org/.

Suggestions

Deep Learning for Assignment of Protein Secondary Structure Elements from C Coordinates
Nasr, Kamal Al; Sekmen, Ali; Bilgin, Bahadir; Jones, Christopher; Koku, Ahmet Buğra (2021-01-01)
© 2021 IEEE.This paper presents a Deep Neural network (DNN) system that uses a large set of geometric and categorical features for classification of secondary structure elements (SSEs) in the protein's trace that consists of Calpha atoms on the backbone. A systematical approach is implemented for classification of protein SSE problem. This approach consists of two network architecture search (NAS) algorithms for selecting (1) network architecture and layer connectivity, and (2) regularization parameters. Ea...
Multi-task Deep Neural Networks in Protein Function Prediction
Rifaioğlu, Ahmet Süreyya; Doğan, Tunca; Martin, Maria Jesus; Atalay, Rengül; Atalay, Mehmet Volkan (2017-05-01)
In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO...
Distance matrices as protein representations
Dinç, Mehmet; Atalay, Mehmet Volkan; Department of Computer Engineering (2022-9-02)
Representing protein sequences is a crucial problem in the field of bioinformatics since any data-driven model's performance is limited by the information contained in its input features. A protein's biological function is dictated by its structure and knowing a protein's structure can potentially help predict its interactions with drug candidates or predict its Gene Ontology (GO) term. Yet, off-the-shelf protein representations do not contain such information since only a small fraction of the billions of ...
Deep Learning-Enabled Technologies for Bioimage Analysis
Rabbi, Fazle; Dabbagh, Sajjad Rahmani; Angın, Pelin; Yetisen, Ali Kemal; Tasoglu, Savas (2022-02-01)
Deep learning (DL) is a subfield of machine learning (ML), which has recently demon-strated its potency to significantly improve the quantification and classification workflows in bio-medical and clinical applications. Among the end applications profoundly benefitting from DL, cellular morphology quantification is one of the pioneers. Here, we first briefly explain fundamental concepts in DL and then we review some of the emerging DL-enabled applications in cell morphology quantification in the fields of em...
Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants
Rifaioğlu, Ahmet Süreyya; Sarac, Omer Sinan; ERSAHİN, Tulin; Saidi, Rabie; Atalay, Mehmet Volkan; Atalay, Rengül (2018-02-01)
Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred co...
Citation Formats
A. S. Rifaioğlu, “Deep learning for prediction of drug-target interaction space and protein functions,” Thesis (Ph.D.) -- Graduate School of Natural and Applied Sciences. Computer Engineering., Middle East Technical University, 2020.