Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
A DEEP LEARNING BASED PROTEIN REPRESENTATION MODEL FOR LOW-DATA PROTEIN FUNCTION PREDICTION
Download
serbulent_unsal_tez.pdf
Date
2023-3-27
Author
Ünsal, Serbülent
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
1931
views
876
downloads
Cite This
Protein science is a broad discipline that involves the study of proteins at the individual and proteome levels through both experimental and computational methods. Protein informatics is a branch of protein science that focuses on the computational and data centric aspects of protein analysis, including the modeling of proteins' quantitative properties. The functional characterization of proteins is a critical aspect of protein science, as it is necessary for the development of new biomedical strategies and biotechnological products. However, the experimental and manual methods typically used for protein functional characterization are time-consuming and costly, and as a result, only a small fraction of the millions of protein entries in databases like UniProt have been manually reviewed and annotated by experts. To address this gap, in silico approaches, including protein function prediction (PFP), are being used to predict protein functions using computational methods. PFP involves the use of machine learning, natural language processing, and other techniques to predict protein functions based on various types of data, including protein sequence, structure, and interactome information. The development of accurate and reusable methods for PFP is an important goal in the field of protein science, as it has the potential to improve our understanding of protein function and advance the field of molecular biology. However, PFP remains an open problem, with current methods not consistently achieving high accuracy in predicting protein functions. One area that has received relatively little attention in the literature is low-data PFP, or the prediction of protein functions with a low number of positive training samples. To address this challenge, we developed a reusable benchmarking framework called Protein RepresentatiOn BEnchmark (PROBE) for evaluating different methods for PFP. This framework allows for the comparison of different approaches to PFP across different dimensions, including data abundance and predicted term specificity. We also developed novel methods specifically designed for addressing the challenge of low-data PFP and evaluated these methods using PROBE. Our results show that the PROBE framework and the novel methods developed for low-data PFP represent a significant contribution to the field of PFP and have the potential to shape future research efforts, particularly in contexts where data availability is limited. Overall, we hope that this study will be beneficial for researchers working in the PFP domain and will contribute to the ongoing efforts to improve our understanding of protein function.
Subject Keywords
Protein Informatics, Machine Learning, Protein Representation Learning, Multimodal Learning, Low Data Learning
URI
https://hdl.handle.net/11511/102816
Collections
Graduate School of Informatics, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
S. Ünsal, “A DEEP LEARNING BASED PROTEIN REPRESENTATION MODEL FOR LOW-DATA PROTEIN FUNCTION PREDICTION,” Ph.D. - Doctoral Program, Middle East Technical University, 2023.