Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Subsequence-based feature map for protein function classification
Download
index.pdf
Date
2008-04-01
Author
Sarac, Omer Sinan
Guersoy-Yuezueguellue, Oezge
Atalay, Rengül
Atalay, Mehmet Volkan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
194
views
0
downloads
Cite This
Automated classification of proteins is indispensable for further in vivo investigation of excessive number of unknown sequences generated by large scale molecular biology techniques. This study describes a discriminative system based on feature space mapping, called subsequence profile map (SPMap) for functional classification of protein sequences. SPMap takes into account the information coming from the subsequences of a protein. A group of protein sequences that belong to the same level of classification is decomposed into fixed-length subsequences and they are clustered to obtain a representative feature space mapping. Mapping is defined as the distribution of the subsequences of a protein sequence over these clusters. The resulting feature space representation is used to train discriminative classifiers for functional families. The aim of this approach is to incorporate information coming from important subregions that are conserved over a family of proteins while avoiding the difficult task of explicit motif identification. The performance of the method was assessed through tests on various protein classification tasks. Our results showed that SPMap is capable of high accuracy classification in most of these tasks. Furthermore SPMap is fast and scalable enough to handle large datasets.
Subject Keywords
Protein function prediction
,
Subsequence distribution
,
Function classification
URI
https://hdl.handle.net/11511/40175
Journal
COMPUTATIONAL BIOLOGY AND CHEMISTRY
DOI
https://doi.org/10.1016/j.compbiolchem.2007.11.004
Collections
Department of Computer Engineering, Article
Suggestions
OpenMETU
Core
Subsequence feature maps for protein function annotation
Saraç, Ömer Sinan; Atalay, Mehmet Volkan; Department of Computer Engineering (2008)
With the advances in sequencing technologies, the number of protein sequences with unknown function increases rapidly. Hence, computational methods for functional annotation of these protein sequences become of the upmost importance. In this thesis, we first defined a feature space mapping of protein primary sequences to fixed dimensional numerical vectors. This mapping, which is called the Subsequence Profile Map (SPMap), takes into account the models of the subsequences of protein sequences. The resulting...
ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
Dalkıran, Alperen; Rifaioğlu, Ahmet Süreyya; Dogan, Tunca; Atalay, Mehmet Volkan (2018-09-21)
Background: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic f...
Enzyme prediction with word embedding approach
Akın, Erkan; Atalay, M. Volkan.; Department of Computer Engineering (2019)
Information such as molecular function, biological process, and cellular localization can be inferred from the protein sequence. However, protein sequences vary in length. Therefore, the sequence itself cannot be used directly as a feature vector for pattern recognition and machine learning algorithms since these algorithms require fixed length feature vectors. We describe an approach based on the use of the Word2vec model, more specifically continuous skip-gram model to generate the vector representation o...
Multi-view subcellular localization prediction of human proteins
Özsarı, Gökhan; Atalay, M. Volkan.; Department of Computer Engineering (2019)
Determining the subcellular localization of proteins is crucial for Understanding the functions of proteins, drug targeting, systems biology, and proteomics research. Experimental validation of subcellular localization is an expensive and challenging process. There exist several computational methods for automated prediction of protein subcellular localization; however, there is still room for better performance. Here, we propose a multi-view SVM-based approach that provides predictions for human proteins. ...
Deep Learning for Assignment of Protein Secondary Structure Elements from C Coordinates
Nasr, Kamal Al; Sekmen, Ali; Bilgin, Bahadir; Jones, Christopher; Koku, Ahmet Buğra (2021-01-01)
© 2021 IEEE.This paper presents a Deep Neural network (DNN) system that uses a large set of geometric and categorical features for classification of secondary structure elements (SSEs) in the protein's trace that consists of Calpha atoms on the backbone. A systematical approach is implemented for classification of protein SSE problem. This approach consists of two network architecture search (NAS) algorithms for selecting (1) network architecture and layer connectivity, and (2) regularization parameters. Ea...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
O. S. Sarac, O. Guersoy-Yuezueguellue, R. Atalay, and M. V. Atalay, “Subsequence-based feature map for protein function classification,”
COMPUTATIONAL BIOLOGY AND CHEMISTRY
, pp. 122–130, 2008, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/40175.