ProFAB-open protein functional annotation benchmark

2023-02-01
Ozdilek, A. Samet
Atakan, Ahmet
Ozsari, Goekhan
Acar, Aybar
Atalay, Mustafa Ümit
DOĞAN, TUNCA
RİFAİOĞLU, AHMET SÜREYYA
As the number of protein sequences increases in biological databases, computational methods are required to provide accurate functional annotation with high coverage. Although several machine learning methods have been proposed for this purpose, there are still two main issues: (i) construction of reliable positive and negative training and validation datasets, and (ii) fair evaluation of their performances based on predefined experimental settings. To address these issues, we have developed ProFAB: Open Protein Functional Annotation Benchmark, which is a platform providing an infrastructure for a fair comparison of protein function prediction methods. ProFAB provides filtered and preprocessed protein annotation datasets and enables the training and evaluation of function prediction methods via several options. We believe that ProFAB will be useful for both computational and experimental researchers by enabling the utilization of ready-to-use datasets and machine learning algorithms for protein function prediction based on Gene Ontology terms and Enzyme Commission numbers. ProFAB is available at and .
BRIEFINGS IN BIOINFORMATICS

Suggestions

SCITUNA: A Network Alignment Approach for Integrating Multiple Single-Cell RNA-Seq Datasets
Dogan , Onur; Erten, Burak Onur; Erten , Cesim; Houdjedj , Aissa; Kazan , Hilal; Krichen , Mohamed; Marouf , Yacine (Orta Doğu Teknik Üniversitesi Enformatik Enstitüsü; 2022-10)
The throughput and cost of single-cell RNA sequencing (scRNA-seq) are in continuous improvement, and so is the demand for larger-scale scRNA-seq data, which could require integrating multiple datasets from different sequencing experiments. The integration of different scRNA-seq datasets could be challenging due to batch effect, a phenomenon that could occur when the experiments are run in different laboratories, at different time periods, or when using different instruments and technologies. Batch effect co...
Distance matrices as protein representations
Dinç, Mehmet; Atalay, Mehmet Volkan; Department of Computer Engineering (2022-9-02)
Representing protein sequences is a crucial problem in the field of bioinformatics since any data-driven model's performance is limited by the information contained in its input features. A protein's biological function is dictated by its structure and knowing a protein's structure can potentially help predict its interactions with drug candidates or predict its Gene Ontology (GO) term. Yet, off-the-shelf protein representations do not contain such information since only a small fraction of the billions of ...
CAP-RNAseq: An online tool for Clustering, Annotation and Prioritization of RNAseq data
Vural Özdeniz , Merve; Çalışır , Kübra; Arıcı , Burçin Irem; Acar , Rana; Targen , Seniye; Dalgıç , Ertuğrul; Konu, Özlen (Orta Doğu Teknik Üniversitesi Enformatik Enstitüsü; 2022-10)
Transcriptome analysis has been an effective high throughput method for examining the regulation and function of genomes. However, analysis of RNAseq experiments with more than two groups/factors is complex and can benefit from clustering followed by annotation and prioritization of genes. While a few advanced analysis pipelines are available for such data sets, none of them make the gene selection process automated and hence easier for validation experiments. To fill this gap, we have developed a web tool ...
Effective gene expression data generation framework based on multi-model approach
Sirin, Utku; Erdogdu, Utku; Polat, Faruk; TAN, MEHMET; Alhajj, Reda (Elsevier BV, 2016-06-01)
Objective: Overcome the lack of enough samples in gene expression data sets having thousands of genes but a small number of samples challenging the computational methods using them.
Subsequence feature maps for protein function annotation
Saraç, Ömer Sinan; Atalay, Mehmet Volkan; Department of Computer Engineering (2008)
With the advances in sequencing technologies, the number of protein sequences with unknown function increases rapidly. Hence, computational methods for functional annotation of these protein sequences become of the upmost importance. In this thesis, we first defined a feature space mapping of protein primary sequences to fixed dimensional numerical vectors. This mapping, which is called the Subsequence Profile Map (SPMap), takes into account the models of the subsequences of protein sequences. The resulting...
Citation Formats
A. S. Ozdilek et al., “ProFAB-open protein functional annotation benchmark,” BRIEFINGS IN BIOINFORMATICS, pp. 0–0, 2023, Accessed: 00, 2023. [Online]. Available: https://hdl.handle.net/11511/102307.