Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Computational representation of protein sequences for homology detection and classification
Download
index.pdf
Date
2006
Author
Oğul, Hasan
Metadata
Show full item record
Item Usage Stats
336
views
75
downloads
Cite This
Machine learning techniques have been widely used for classification problems in computational biology. They require that the input must be a collection of fixedlength feature vectors. Since proteins are of varying lengths, there is a need for a means of representing protein sequences by a fixed-number of features. This thesis introduces three novel methods for this purpose: n-peptide compositions with reduced alphabets, pairwise similarity scores by maximal unique matches, and pairwise similarity scores by probabilistic suffix trees. New sequence representations described in the thesis are applied on three challenging problems of computational biology: remote homology detection, subcellular localization prediction, and solvent accessibility prediction, with some problem-specific modifications. Rigorous experiments are conducted on common benchmarking datasets, and a comparative analysis is performed between the new methods and the existing ones for each problem. On remote homology detection tests, all three methods achieve competitive accuracies with the state-of-the-art methods, while being much more efficient. A combination of new representations are used to devise a hybrid system, called PredLOC, for predicting subcellular localization of proteins and it is tested on two distinct eukaryotic datasets. To the best of author̕s knowledge, the accuracy achieved by PredLOC is the highest one ever reported on those datasets. The maximal unique match method is resulted with only a slight improvement in solvent accessibility predictions.
Subject Keywords
Computer science.
URI
http://etd.lib.metu.edu.tr/upload/12606997/index.pdf
https://hdl.handle.net/11511/15857
Collections
Graduate School of Informatics, Thesis
Suggestions
OpenMETU
Core
Modelling and predicting binding affinity of PCP-like compounds using machine learning methods
Erdaş, Özlem; Alpaslan, Ferda Nur; Department of Computer Engineering (2007)
Machine learning methods have been promising tools in science and engineering fields. The use of these methods in chemistry and drug design has advanced after 1990s. In this study, molecular electrostatic potential (MEP) surfaces of PCP-like compounds are modelled and visualized in order to extract features which will be used in predicting binding affinity. In modelling, Cartesian coordinates of MEP surface points are mapped onto a spherical self-organizing map. Resulting maps are visualized by using values...
Neural networks with piecewise constant argument and impact activation
Yılmaz, Enes; Akhmet, Marat; Department of Scientific Computing (2011)
This dissertation addresses the new models in mathematical neuroscience: artificial neural networks, which have many similarities with the structure of human brain and the functions of cells by electronic circuits. The networks have been investigated due to their extensive applications in classification of patterns, associative memories, image processing, artificial intelligence, signal processing and optimization problems. These applications depend crucially on the dynamical behaviors of the networks. In t...
A temporal neural network model for constructing connectionist expert system knowledge bases
Alpaslan, Ferda Nur (Elsevier BV, 1996-04-01)
This paper introduces a temporal feedforward neural network model that can be applied to a number of neural network application areas, including connectionist expert systems. The neural network model has a multi-layer structure, i.e. the number of layers is not limited. Also, the model has the flexibility of defining output nodes in any layer. This is especially important for connectionist expert system applications.
Video Shot Boundary Detection by Graph-theoretic Dominant Sets Approach
Asan, Emrah; Alatan, Abdullah Aydın (2009-09-16)
We present a video shot boundary detection algorithm based on the novel graph theoretic concept, namely dominant sets. Dominant sets are defined as a set of the nodes in a graph, mostly similar to each other and dissimilar to the others. In order to achieve this goal, candidate shot boundaries are determined by using simply pixelwise differences between consequent frames. For each candidate position, a testing sequence is constructed by considering 4 frames before the candidate position and 2 frames after t...
Learning customized and optimized lists of rules with mathematical programming
Rudin, Cynthia; Ertekin Bolelli, Şeyda (Springer Science and Business Media LLC, 2018-12-01)
We introduce a mathematical programming approach to building rule lists, which are a type of interpretable, nonlinear, and logical machine learning classifier involving IF-THEN rules. Unlike traditional decision tree algorithms like CART and C5.0, this method does not use greedy splitting and pruning. Instead, it aims to fully optimize a combination of accuracy and sparsity, obeying user-defined constraints. This method is useful for producing non-black-box predictive models, and has the benefit of a clear ...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
H. Oğul, “Computational representation of protein sequences for homology detection and classification,” Ph.D. - Doctoral Program, Middle East Technical University, 2006.