Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants

2018-02-01
Rifaioğlu, Ahmet Süreyya
Sarac, Omer Sinan
ERSAHİN, Tulin
Saidi, Rabie
Atalay, Mehmet Volkan
Atalay, Rengül
Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred covers nearly the whole functionality spectrum in Gene Ontology system and it can predict both generic and specific GO terms. UniGOPred was run on CAFA2 challenge target protein sequences and it is categorized within the top 10 best performing methods for the molecular function category. In addition, the performance of UniGOPred is higher compared to the baseline BLAST classifier in all categories of GO. UniGOPred predictions are compared with UniProtKB/TrEMBL database annotations as well. Furthermore, the proposed tool's ability to predict negatively associated GO terms that defines the functions that a protein does not possess, is discussed. UniGOPred annotations were also validated by case studies on PTEN protein variants experimentally and on CHD8 protein variants with literature. UniGOPred protein functional annotation system is available as an open access tool at .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS

Suggestions

Distance matrices as protein representations
Dinç, Mehmet; Atalay, Mehmet Volkan; Department of Computer Engineering (2022-9-02)
Representing protein sequences is a crucial problem in the field of bioinformatics since any data-driven model's performance is limited by the information contained in its input features. A protein's biological function is dictated by its structure and knowing a protein's structure can potentially help predict its interactions with drug candidates or predict its Gene Ontology (GO) term. Yet, off-the-shelf protein representations do not contain such information since only a small fraction of the billions of ...
Deep Learning-Enabled Technologies for Bioimage Analysis
Rabbi, Fazle; Dabbagh, Sajjad Rahmani; Angın, Pelin; Yetisen, Ali Kemal; Tasoglu, Savas (2022-02-01)
Deep learning (DL) is a subfield of machine learning (ML), which has recently demon-strated its potency to significantly improve the quantification and classification workflows in bio-medical and clinical applications. Among the end applications profoundly benefitting from DL, cellular morphology quantification is one of the pioneers. Here, we first briefly explain fundamental concepts in DL and then we review some of the emerging DL-enabled applications in cell morphology quantification in the fields of em...
A computational model of social dynamics of musical agreement
Öztürel, İsmet Adnan; Bozşahin, Hüseyin Cem; Department of Cognitive Sciences (2011)
Semiotic dynamics and computational evolutionary musicology literature investigate emergence and evolution of linguistic and musical conventions by using computational multi-agent complex adaptive system models. This thesis proposes a new computational evolutionary musicology model, by altering previous models of familiarity based musical interactions that try to capture evolution of songs as a co-evolutionary process through mate selection. The proposed modified familiarity game models a closed community o...
A temporal neural network model for constructing connectionist expert system knowledge bases
Alpaslan, Ferda Nur (Elsevier BV, 1996-04-01)
This paper introduces a temporal feedforward neural network model that can be applied to a number of neural network application areas, including connectionist expert systems. The neural network model has a multi-layer structure, i.e. the number of layers is not limited. Also, the model has the flexibility of defining output nodes in any layer. This is especially important for connectionist expert system applications.
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
Zhou, N; et. al. (2019-11-19)
Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the ex...
Citation Formats
A. S. Rifaioğlu, O. S. Sarac, T. ERSAHİN, R. Saidi, M. V. Atalay, and R. Atalay, “Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants,” PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, pp. 135–151, 2018, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/32670.