Hide/Show Apps

Prediction of protein subcellular localization using global protein sequence feature

Bozkurt, Burçin
The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years. Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic properties of proteins. Knowledge of the structure of a protein is essential for describing and understanding its function. In addition, subcellular localization of a protein can be used to provide some amount of characterization of a protein. In this study, a method for the prediction of protein subcellular localization based on primary sequence data is described. Primary sequence data for a protein is based on amino acid sequence. The frequency value for each amino acid is computed in one given position. Assigned frequencies are used in a new encoding scheme that conserves biological information based on point accepted mutations (PAM) substitution matrix. This method can be used to predict the nuclear, the cytosolic sequences, the mitochondrial targeting peptides (mTP) and the signal peptides (SP). For clustering purposes, other than well known traditional techniques,