Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
TRAINER: A General-Purpose Trainable Short Biosequence Classifer
Date
2013-10-01
Author
OĞUL, HASAN
Kalkan, Alper T.
Umu, Sinan U.
Akkaya, Mahinur
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
131
views
0
downloads
Cite This
Classifying sequences is one of the central problems in computational biosciences. Several tools have been released to map an unknown molecular entity to one of the known classes using solely its sequence data. However, all of the existing tools are problem-specific and restricted to an alphabet constrained by relevant biological structure. Here, we introduce TRAINER, a new online tool designed to serve as a generic sequence classification platform to enable users provide their own training data with any alphabet therein defined. TRAINER allows users to select among several feature representation schemes and supervised machine learning methods with relevant parameters. Trained models can be saved for future use without retraining by other users. Two case studies are reported for effective use of the system for DNA and protein sequences; candidate effector prediction and nucleolar localization signal prediction. Biological relevance of the results is discussed.
Subject Keywords
Sequence classification
,
Web server
,
K-nearest neighbors
,
Naive Bayes classifier
,
Support vector machine
URI
https://hdl.handle.net/11511/55440
Journal
PROTEIN AND PEPTIDE LETTERS
Collections
Graduate School of Natural and Applied Sciences, Article
Suggestions
OpenMETU
Core
SVM-based detection of distant protein structural relationships using pairwise probabilistic suffix trees
Ogul, Hasan; Mumcuoğlu, Ünal Erkan (2006-08-01)
A new method based on probabilistic suffix trees (PSTs) is defined for pairwise comparison of distantly related protein sequences. The new definition is adopted in a discriminative framework for protein classification using pairwise sequence similarity scores in feature encoding. The framework uses support vector machines (SVMs) to separate structurally similar and dissimilar examples. The new discriminative system, which we call as SVM-PST, has been tested for SCOP family classification task, and compared ...
MUTATION CLASSES OF SKEW-SYMMETRIZABLE 3 x 3 MATRICES
Seven, Ahmet İrfan (2013-05-01)
Mutation of skew-symmetrizable matrices is a fundamental operation that first arose in Fomin-Zelevinsky's theory of cluster algebras; it also appears naturally in many different areas of mathematics. In this paper, we study mutation classes of skew-symmetrizable 3 x 3 matrices and associated graphs. We determine representatives for these classes using a natural minimality condition, generalizing and strengthening results of Beineke-BrustleHille and Felikson-Shapiro-Tumarkin. Furthermore, we obtain a new num...
Loop-based conic multivariate adaptive regression splines is a novel method for advanced construction of complex biological networks
Ayyıldız Demirci, Ezgi; Purutçuoğlu Gazi, Vilda; Weber, Gerhard Wilhelm (2018-11-01)
The Gaussian Graphical Model (GGM) and its Bayesian alternative, called, the Gaussian copula graphical model (GCGM) are two widely used approaches to construct the undirected networks of biological systems. They define the interactions between species by using the conditional dependencies of the multivariate normality assumption. However, when the system's dimension is high, the performance of the model becomes computationally demanding, and, particularly, the accuracy of GGM decreases when the observations...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
PARALLEL MULTILEVEL FAST MULTIPOLE ALGORITHM FOR COMPLEX PLASMONIC METAMATERIAL STRUCTURES
Ergül, Özgür Salih (2013-11-09)
A parallel implementation of the multilevel fast multipole algorithm (MLFMA) is developed for fast and accurate solutions of electromagnetics problems involving complex plasmonic metamaterial structures. Composite objects that consist of multiple penetrable regions, such as dielectric, lossy, and plasmonic parts, are formulated rigorously with surface integral equations and solved iteratively via MLFMA. Using the hierarchical strategy for the parallelization, the developed implementation is capable of simul...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
H. OĞUL, A. T. Kalkan, S. U. Umu, and M. Akkaya, “TRAINER: A General-Purpose Trainable Short Biosequence Classifer,”
PROTEIN AND PEPTIDE LETTERS
, pp. 1108–1114, 2013, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55440.