Subsequence feature maps for protein function annotation

Download
2008
Saraç, Ömer Sinan
With the advances in sequencing technologies, the number of protein sequences with unknown function increases rapidly. Hence, computational methods for functional annotation of these protein sequences become of the upmost importance. In this thesis, we first defined a feature space mapping of protein primary sequences to fixed dimensional numerical vectors. This mapping, which is called the Subsequence Profile Map (SPMap), takes into account the models of the subsequences of protein sequences. The resulting vectors were used as an input to support vector machines (SVM) for functional classification of proteins. Second, we defined the protein functional annotation problem as a classification problem and construct a classification framework defined on Gene Ontology (GO) terms. Di erent classification methods as well as their combinations are assessed on this framework which is based on 300 GO molecular function terms. The reiv sults showed that combination enhances the classification accuracy. The resultant system is made publicly available as an online function annotation tool.

Suggestions

A clustering method for the problem of protein subcellular localization
Bezek, Perit; Atalay, Mehmet Volkan; Department of Computer Engineering (2006)
In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein’s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common; that is, occurrence (frequency) of common subsequences should be high. Our ...
A classification system for the problem of protein subcellular localization
Alay, Gökçen; Atalay, Mehmet Volkan; Department of Computer Engineering (2007)
The focus of this study is on predicting the subcellular localization of a protein. Subcellular localization information is important for protein function annotation which is a fundamental problem in computational biology. For this problem, a classification system is built that has two main parts: a predictor that is based on a feature mapping technique to extract biologically meaningful information from protein sequences and a client/server architecture for searching and predicting subcellular localization...
A systematic study of probabilistic aggregation strategies in swarm robotic systems
Soysal, Onur; Şahin, Erol; Department of Computer Engineering (2005)
In this study, a systematic analysis of probabilistic aggregation strategies in swarm robotic systems is presented. A generic aggregation behavior is proposed as a combination of four basic behaviors: obstacle avoidance, approach, repel, and wait. The latter three basic behaviors are combined using a three-state finite state machine with two probabilistic transitions among them. Two different metrics were used to compare performance of strategies. Through systematic experiments, how the aggregation performa...
Controlling discrete genetic regulatory networks
Abul, Osman; Polat, Faruk; Department of Computer Engineering (2005)
Genetic regulatory networks can model dynamics of cells. They also allow for studying the effect of internal or external interventions. Selectively applying interventions towards a certain objective is known as controlling network dynamics. In this thesis work, the issue of how the external interventions af fect the network is studied. The effects are determined using differential gene expression analysis. The differential gene expression problem is further studied to improve the power of the given method. ...
Subsequence-based feature map for protein function classification
Sarac, Omer Sinan; Guersoy-Yuezueguellue, Oezge; Atalay, Rengül; Atalay, Mehmet Volkan (2008-04-01)
Automated classification of proteins is indispensable for further in vivo investigation of excessive number of unknown sequences generated by large scale molecular biology techniques. This study describes a discriminative system based on feature space mapping, called subsequence profile map (SPMap) for functional classification of protein sequences. SPMap takes into account the information coming from the subsequences of a protein. A group of protein sequences that belong to the same level of classification...
Citation Formats
Ö. S. Saraç, “Subsequence feature maps for protein function annotation,” Ph.D. - Doctoral Program, Middle East Technical University, 2008.