TRAINER: A General-Purpose Trainable Short Biosequence Classifer

Kalkan, Alper T.
Umu, Sinan U.
Akkaya, Mahinur
Classifying sequences is one of the central problems in computational biosciences. Several tools have been released to map an unknown molecular entity to one of the known classes using solely its sequence data. However, all of the existing tools are problem-specific and restricted to an alphabet constrained by relevant biological structure. Here, we introduce TRAINER, a new online tool designed to serve as a generic sequence classification platform to enable users provide their own training data with any alphabet therein defined. TRAINER allows users to select among several feature representation schemes and supervised machine learning methods with relevant parameters. Trained models can be saved for future use without retraining by other users. Two case studies are reported for effective use of the system for DNA and protein sequences; candidate effector prediction and nucleolar localization signal prediction. Biological relevance of the results is discussed.


Seven, Ahmet İrfan (2013-05-01)
Mutation of skew-symmetrizable matrices is a fundamental operation that first arose in Fomin-Zelevinsky's theory of cluster algebras; it also appears naturally in many different areas of mathematics. In this paper, we study mutation classes of skew-symmetrizable 3 x 3 matrices and associated graphs. We determine representatives for these classes using a natural minimality condition, generalizing and strengthening results of Beineke-BrustleHille and Felikson-Shapiro-Tumarkin. Furthermore, we obtain a new num...
Loop-based conic multivariate adaptive regression splines is a novel method for advanced construction of complex biological networks
Ayyıldız Demirci, Ezgi; Purutçuoğlu Gazi, Vilda; Weber, Gerhard Wilhelm (2018-11-01)
The Gaussian Graphical Model (GGM) and its Bayesian alternative, called, the Gaussian copula graphical model (GCGM) are two widely used approaches to construct the undirected networks of biological systems. They define the interactions between species by using the conditional dependencies of the multivariate normality assumption. However, when the system's dimension is high, the performance of the model becomes computationally demanding, and, particularly, the accuracy of GGM decreases when the observations...
Time series on riemannian manifolds
Ergezer, Hamza; Leblebicioğlu, Mehmet Kemal; Department of Electrical and Electronics Engineering (2017)
In this thesis, feature covariance matrices are utilized to solve several problems related to time series. In the first part of the thesis, a novel representation is proposed to represent the time series using feature covariance matrices. By this representation, time series are carried onto Riemannian manifold space. The proposed representation is firstly applied to trajectories which are essentially 2D time series. Anomaly detection and activity perception problems in crowded visual scenes are studied by usi...
SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees
Ogul, Hasan; Mumcuoğlu, Ünal Erkan (2006-08-01)
A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared ...
Classification through incremental max-min separability
Bagirov, Adil M.; Ugon, Julien; Webb, Dean; Karasözen, Bülent (2011-05-01)
Piecewise linear functions can be used to approximate non-linear decision boundaries between pattern classes. Piecewise linear boundaries are known to provide efficient real-time classifiers. However, they require a long training time. Finding piecewise linear boundaries between sets is a difficult optimization problem. Most approaches use heuristics to avoid solving this problem, which may lead to suboptimal piecewise linear boundaries. In this paper, we propose an algorithm for globally training hyperplan...
Citation Formats
H. OĞUL, A. T. Kalkan, S. U. Umu, and M. Akkaya, “TRAINER: A General-Purpose Trainable Short Biosequence Classifer,” PROTEIN AND PEPTIDE LETTERS, pp. 1108–1114, 2013, Accessed: 00, 2020. [Online]. Available: