Prediction of enzymatic properties of protein sequences based on the enzyme commission nomenclature

Dalkıran, Alperen
The volume of expert manual annotation of biomolecules is steady due to high costs associated with it, although the number of sequenced genomes continues to grow exponentially. Computational methods have been proposed in order to predict the attributes of gene products. The prediction of Enzyme Commission (EC) numbers is a challenging issue in this area. Enzymes have crucial roles in metabolic pathways, therefore they are widely employed in biotechnological and biomedical pplications. EC numbers are numerical representations of enzymatic functions based on chemical reactions that they catalyze. Due to the cost and labor extensiveness of in vitro experiments EC classification annotation of catalytically active proteins are limited. Therefore, computational tools have been proposed to classify these proteins to annotate them with EC nomenclature. However, the performance of existing tools indicates that EC number prediction still requires improvement. Here, we present an EC number prediction tool, ECPred, to obtain predictions for large-scale protein sets. In ECPred, we employed hierarchical data preparation and evaluation steps by utilizing the functional relations among the four levels of EC annotation system. The main features that distinguish our approach from existing studies are the use of a combination of independent classifiers, and novel data preparation and evaluation methods. Totally, 858 EC classifiers are trained which consists of 6 main, 55 subfamily, 163 sub-subfamily and 634 substrate EC class classifiers. The average F-score value of 0.99 is obtained for all EC classes using the validation datasets. Enzyme or non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach. To the best of our knowledge, this is the first study that predicts the enzymatic function of proteins starting from Level 0 (enzyme/non-enzyme) going up to Level 4 (substrate class). Finally, ECPred is compared with other similar tools on independent test sets and ECPred obtained better results than existing tools, however, the results show that there is still room for improvement. 
Citation Formats
A. Dalkıran, “Prediction of enzymatic properties of protein sequences based on the enzyme commission nomenclature,” M.S. - Master of Science, Middle East Technical University, 2017.