A novel model-based method for feature extraction from protein sequences for classification

Sarac, Omer Sinan
Atalay, Mehmet Volkan
Atalay, Rengül
Representation of amino-acid sequences constitutes the key point in classification of proteins into functional or structural classes. The representation should contain the biologically meaningful information hidden in the primary sequence of the protein. Conserved or similar subsequences are strong indicators of functional and structural similarity. In this study we present a feature mapping that takes into account the models of the subsequences of protein sequences. An expectation-maximization algorithm along with an HMM mixture model is used to cluster and learn the models of subsequences of a given set of proteins.


An attempt to classify Turkish district data : K-Means and Self-Organizing Map (SOM) algorithms
Aksoy, Ece; Işık, Oğuz; Department of Geodetic and Geographical Information Technologies (2004)
There is no universally applicable clustering technique in discovering the variety of structures display in data sets. Also, a single algorithm or approach is not adequate to solve every clustering problem. There are many methods available, the criteria used differ and hence different classifications may be obtained for the same data. While larger and larger amounts of data are collected and stored in databases, there is increasing the need for efficient and effective analysis methods. Grouping or classific...
A new hybrid multi-relational data mining technique
Toprak, Seda Dağlar; Toroslu, İ. Hakkı; Department of Computer Engineering (2005)
Multi-relational learning has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. As patterns involve multiple relations, the search space of possible hypotheses becomes intractably complex. Many relational knowledge discovery systems have been developed employing various search strategies, search heuristics and pattern language limitations in order to cope with the complexity of hypothesis space. In this w...
A Framework for Machine Vision based on Neuro-Mimetic Front End Processing and Clustering
Akbaş, Emre; ECKSTEIN, Miguel; MADHOW, Upamanyu (2014-10-03)
Convolutional deep neural nets have emerged as a highly effective approach for machine vision, but there are a number of open issues regarding training (e.g., a large number of model parameters to be learned, and a number of manually tuned algorithm parameters) and interpretation (e.g., geometric interpretations of neurons at various levels of the hierarchy). In this paper, our goal is to explore alternative convolutional architectures which are easier to interpret and simpler to implement. In particular, w...
A novel SVM-ID3 Hybrid Feature Selection Method to Build a Disease Model for Melanoma using Integrated Genotyping and Phenotype Data from dbGaP
Aydın Son, Yeşim (2014-09-03)
The relations between Single Nucleotide Polymorphism (SNP) and complex diseases are likely to be non-linear and require analysis of the high dimensional data. Previous studies in the field mostly focus on genotyping and effects of various phenotypes are not considered. To fill this gap a hybrid feature selection model of support vector machine and decision tree has been designed. The designed method is tested on melanoma. We were able to select phenotypic features such as moles and dysplastic nevi, and SNPs...
A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets
OĞUL, Hasan; Mumcuoğlu, Ünal Erkan (2007-01-01)
In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP famil...
Citation Formats
O. S. Sarac, M. V. Atalay, and R. Atalay, “A novel model-based method for feature extraction from protein sequences for classification,” 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/43935.