Subsequence feature maps for protein function annotation

Saraç, Ömer Sinan
With the advances in sequencing technologies, the number of protein sequences with unknown function increases rapidly. Hence, computational methods for functional annotation of these protein sequences become of the upmost importance. In this thesis, we first defined a feature space mapping of protein primary sequences to fixed dimensional numerical vectors. This mapping, which is called the Subsequence Profile Map (SPMap), takes into account the models of the subsequences of protein sequences. The resulting vectors were used as an input to support vector machines (SVM) for functional classification of proteins. Second, we defined the protein functional annotation problem as a classification problem and construct a classification framework defined on Gene Ontology (GO) terms. Di erent classification methods as well as their combinations are assessed on this framework which is based on 300 GO molecular function terms. The reiv sults showed that combination enhances the classification accuracy. The resultant system is made publicly available as an online function annotation tool.
Citation Formats
Ö. S. Saraç, “Subsequence feature maps for protein function annotation,” Ph.D. - Doctoral Program, Middle East Technical University, 2008.