Prediction of enzyme classes in a hierarchical approach by using spmap

Download

index.pdf

Date

2009

Author

Yaman, Ayşe Gül

Metadata

Show full item record

Item Usage Stats

316
views

284
downloads

Enzymes are proteins that play an important role in biochemical reactions as catalysts. They are classified based on the reaction they catalyzed, in a hierarchical scheme by International Enzyme Commission (EC). This hierarchical scheme is expressed as a four-level tree structure and a unique number is assigned to each enzyme class. There are six major classes at the top level according to the reaction they carried out and sub-classes at the lower levels are further specific reactions of these classes. The aim of this thesis is to build a three-level classification model based on the hierarchical structure of EC classes. ENZYME database is used to extract the information of EC classes and enzymes are assigned to these EC classes. Primary sequences of enzymes extracted from UniProtKB/Swiss-Prot database are used to extract features. A subsequence based feature extraction method, Subsequence Profile Map (SPMap) is used in this study. SPMap is a method that explicitly models the differences between positive and negative examples. SPMap pays attention to the conserved subsequences of protein sequences in the same class. SPMap generates the feature vector of each sample protein as a probability of fixed-length subsequences of this protein with respect to a probabilistic profile matrix calculated by clustering similar subsequences in the training dataset. In our case, positive and negative training datasets are prepared for each class, at each level of the tree structure. Subsequence Profile Map (SPMap) is used for feature extraction and Support Vector Machines (SVMs) are used for classification. Five-fold cross validation is used to test the performance of the system. The overall sensitivity, specificity and AUC values for the six major EC classes are 93.08%, 98.95% and 0.993, respectively. The results at the second- and third- levels are also promising.

Subject Keywords

Computer enginnering., Computer software.

URI

http://etd.lib.metu.edu.tr/upload/2/12610969/index.pdf
https://hdl.handle.net/11511/19023

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Prediction of enzyme classes in a hierarchical approach by using SPMap Yaman, A.; Atalay, Mehmet Volkan; Atalay, Rengül (2010-04-01) Enzymes are proteins that play important roles in biochemical reactions as catalysts. They are classified based on the reaction they catalyzed, in a hierarchical scheme by International Enzyme Commission (EC). This hierarchical scheme is expressed in four-level tree structure and a unique number is assigned to each enzyme class. There are six major classes at the top level according to the reaction they carried out and sub-classes at the lower levels are further specific reactions of these classes. The aim ...
Modeling of various biological networks via LCMARS AYYILDIZ DEMİRCİ, EZGİ; Purutçuoğlu Gazi, Vilda (Elsevier BV, 2018-09-01) In system biology, the interactions between components such as genes, proteins, can be represented by a network. To understand the molecular mechanism of complex biological systems, construction of their networks plays a crucial role. However, estimation of these biological networks is a challenging problem because of their high dimensional and sparse structures. Several statistical methods are proposed to overcome this issue. The Conic Multivariate Adaptive Regression Splines (CMARS) is one of the recent n...
A clustering method for the problem of protein subcellular localization Bezek, Perit; Atalay, Mehmet Volkan; Department of Computer Engineering (2006) In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein’s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common; that is, occurrence (frequency) of common subsequences should be high. Our ...
Prediction of protein-protein interactions from sequence using evolutionary relations of proteins and species Güney, Tacettin Doğacan; Can, Tolga; Department of Computer Engineering (2009) Prediction of protein-protein interactions is an important part in understanding the biological processes in a living cell. There are completely sequenced organisms that do not yet have experimentally verified protein-protein interaction networks. For such organisms, we can not generally use a supervised method, where a portion of the protein-protein interaction network is used as training set. Furthermore, for newly-sequenced organisms, many other data sources, such as gene expression data and gene ontolog...
Coevolution based prediction of protein-protein ınteractions with reduced training data Pamuk, Bahar; Can, Tolga; Department of Computer Engineering (2009) Protein-protein interactions are important for the prediction of protein functions since two interacting proteins usually have similar functions in a cell. Available protein interaction networks are incomplete; but, they can be used to predict new interactions in a supervised learning framework. However, in the case that the known protein network includes large number of protein pairs, the training time of the machine learning algorithm becomes quite long. In this thesis work, our aim is to predict protein-...

Citation Formats

A. G. Yaman, “Prediction of enzyme classes in a hierarchical approach by using spmap,” M.S. - Master of Science, Middle East Technical University, 2009.