TRAINER: A General-Purpose Trainable Short Biosequence Classifer

Date

2013-10-01

Author

OĞUL, HASAN
Kalkan, Alper T.
Umu, Sinan U.
Akkaya, Mahinur

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

131
views

0
downloads

Classifying sequences is one of the central problems in computational biosciences. Several tools have been released to map an unknown molecular entity to one of the known classes using solely its sequence data. However, all of the existing tools are problem-specific and restricted to an alphabet constrained by relevant biological structure. Here, we introduce TRAINER, a new online tool designed to serve as a generic sequence classification platform to enable users provide their own training data with any alphabet therein defined. TRAINER allows users to select among several feature representation schemes and supervised machine learning methods with relevant parameters. Trained models can be saved for future use without retraining by other users. Two case studies are reported for effective use of the system for DNA and protein sequences; candidate effector prediction and nucleolar localization signal prediction. Biological relevance of the results is discussed.

Subject Keywords

Sequence classification, Web server, K-nearest neighbors, Naive Bayes classifier, Support vector machine

URI

https://hdl.handle.net/11511/55440

Journal

PROTEIN AND PEPTIDE LETTERS

Collections

Graduate School of Natural and Applied Sciences, Article

Suggestions

OpenMETU
Core

SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees Ogul, Hasan; Mumcuoğlu, Ünal Erkan (2006-08-01) A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared ...
MUTATION CLASSES OF SKEW-SYMMETRIZABLE 3 x 3 MATRICES Seven, Ahmet İrfan (2013-05-01) Mutation of skew-symmetrizable matrices is a fundamental operation that first arose in Fomin-Zelevinsky's theory of cluster algebras; it also appears naturally in many different areas of mathematics. In this paper, we study mutation classes of skew-symmetrizable 3 x 3 matrices and associated graphs. We determine representatives for these classes using a natural minimality condition, generalizing and strengthening results of Beineke-BrustleHille and Felikson-Shapiro-Tumarkin. Furthermore, we obtain a new num...
Loop-based conic multivariate adaptive regression splines is a novel method for advanced construction of complex biological networks Ayyıldız Demirci, Ezgi; Purutçuoğlu Gazi, Vilda; Weber, Gerhard Wilhelm (2018-11-01) The Gaussian Graphical Model (GGM) and its Bayesian alternative, called, the Gaussian copula graphical model (GCGM) are two widely used approaches to construct the undirected networks of biological systems. They define the interactions between species by using the conditional dependencies of the multivariate normality assumption. However, when the system's dimension is high, the performance of the model becomes computationally demanding, and, particularly, the accuracy of GGM decreases when the observations...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03) In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
PARALLEL MULTILEVEL FAST MULTIPOLE ALGORITHM FOR COMPLEX PLASMONIC METAMATERIAL STRUCTURES Ergül, Özgür Salih (2013-11-09) A parallel implementation of the multilevel fast multipole algorithm (MLFMA) is developed for fast and accurate solutions of electromagnetics problems involving complex plasmonic metamaterial structures. Composite objects that consist of multiple penetrable regions, such as dielectric, lossy, and plasmonic parts, are formulated rigorously with surface integral equations and solved iteratively via MLFMA. Using the hierarchical strategy for the parallelization, the developed implementation is capable of simul...

Citation Formats

H. OĞUL, A. T. Kalkan, S. U. Umu, and M. Akkaya, “TRAINER: A General-Purpose Trainable Short Biosequence Classifer,” PROTEIN AND PEPTIDE LETTERS, pp. 1108–1114, 2013, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55440.