Machine learning methods for promoter region prediction

Download
2011
Arslan, Hilal
Promoter classification is the task of separating promoter from non promoter sequences. Determining promoter regions where the transcription initiation takes place is important for several reasons such as improving genome annotation and defining transcription start sites. In this study, various promoter prediction methods called ProK-means, ProSVM, and 3S1C are proposed. In ProSVM and ProK-means algorithms, structural features of DNA sequences are used to distinguish promoters from non promoters. Obtained results are compared with ProSOM which is an existing promoter prediction method. It is shown that ProSVM is able to achieve greater recall rate compared to ProSOM results. Another promoter prediction methods proposed in this study is 3S1C. The difference of the proposed technique from existing methods is using signal, similarity, structure, and context features of DNA sequences in an integrated way and a hierarchical manner. In addition to current methods related to promoter classification, the similarity feature, which compares the promoter regions between human and other species, is added to the proposed system. We show that the similarity feature improves the accuracy. To classify core promoter regions, firstly, signal, similarity, structure, and context features are extracted and then, these features are classified separately by using Support Vector Machines. Finally, output predictions are combined using multilayer perceptron. The result of 3S1C algorithm is very promising.

Suggestions

Predicting the effect of hydrophobicity surface on binding affinity of PCP-like compounds using machine learning methods
Yoldaş, Mine; Alpaslan, Ferda Nur; Büyükbingöl, Erdem; Department of Computer Engineering (2011)
This study aims to predict the binding affinity of the PCP-like compounds by means of molecular hydrophobicity. Molecular hydrophobicity is an important property which aff ects the binding affinity of molecules. The values of molecular hydrophobicity of molecules are obtained on three-dimensional coordinate system. Our aim is to reduce the number of points on the hydrophobicity surface of the molecules. This is modeled by using self organizing maps (SOM) and k-means clustering. The feature sets obtained fro...
Modeling of Multi Open Phase Fault Condition of Multi-phase Permanent Magnet Synchronous Motors
Fei, Marco; Zanasi, Roberto (2011-09-10)
This paper deals with the modeling of multi-phase permanent magnet synchronous motors under multi open phase fault condition. The presented model is suitable for generic number of phases, generic shape of the rotor flux and generic number of open circuit faults. The motor model in fault condition can be used for faults occurring on both adjacent and not adjacent phases. The model can be very useful both for simulation and implementation of fault-tolerant control strategies.
3D analysis of the binding sites for predicting binding affinities in drug design
Ataç, Ali Osman; Alpaslan, Ferda Nur; Büyükbingöl, Erdem; Department of Computer Engineering (2014)
Understanding the interaction between drug molecules and proteins is one of the main challenges in drug design. Several tools have been developed recently to decrease the complexity of the process. Artificial intelligence and machine learning methods have promising results in predicting the affinities. Recently, accurate estimations have been performed by extracting the electrostatic potentials from images of the drug-protein binding sites which were generated by autodocking simulator. In this study, a new ...
Non-destructive testing of textured foods by machine vision
Beriat, Pelin; Çetin, Yasemin; Department of Information Systems (2009)
In this thesis, two different approaches are used to extract the relevant features for classifying the aflatoxin contaminated and uncontaminated scaled chili pepper samples: Statistical approach and Local Discriminant Bases (LDB) approach. In the statistical approach, First Order Statistical (FOS) features and Gray Level Cooccurrence Matrix (GLCM) features are extracted. In the LDB approach, the original LDB algorithm is modified to perform 2D searches to extract the most discriminative features from the hy...
Highly efficient polymer blends from a polyfluorene derivative and PVK for LEDs
NOWACKI, Bruno; IAMAZAKI, Eduardo; Çırpan, Ali; KARASZ, Frank; ATVARS, Teresa D.Z.; AKCELRUD, Leni (Elsevier BV, 2009-11-27)
The photophysical and electroluminescent properties of blends of a polyfluorene derivative of the PPV type, poly[(9,9-dihexyl-9H-fluorene-2,7-diyl)-1,2-ethenediyl-1,4-phenylene-1,2-ethenediyl] (labeled as LaPPS16) and poly(vinylcarbazole) - PVK are presented and discussed in terms of the operating light emission mechanisms. Static and dynamic fluorescence measurements and morphology data showed a powerful exciton migration from the host (PVK) to the guest (LaPPS16) resulting in emission coming from solely L...
Citation Formats
H. Arslan, “Machine learning methods for promoter region prediction,” M.S. - Master of Science, Middle East Technical University, 2011.