The effect of representative training dataset selection on the classification performance of the promoter sequences

2011-05-05
YAMAN, Ayse Gul
Can, Tolga
Promoter prediction is an important task for genome annotation. The aim of this study is to build a classification method for promoter prediction. Base-stacking energy values of dinucleotides are used for feature extraction and Support Vector Machines (SVMs) are used for classification. Human genome promoter sequences are used as the positive training data and three types of datasets are prepared as the negative data including intergenic and transcribed sequences. Best results are achieved by selecting equal number of random sequences from intergenic and transcribed sequences while preparing the negative datasets.

Suggestions

Design and construction of double promoter systems and their use in pharmaceutical protein production in P. Pastoris
Demir, İrem; Çalık, Pınar; Department of Chemical Engineering (2019)
Intracellular phenomena such as promoter strength, mRNA secondary structure, translation efficiency and codon preference, 5′-untranslated region processing, and protein turnover, have impacts directly on the expression of heterologous genes. Design of multi-promoter expression systems with constituent strong promoters and engineered promoter variants is a novel metabolic engineering strategy for increasing the promoter strength further, and tuning the expression for recombinant protein (r-protein) productio...
Exploring encapsulation mechanism of DNA and mononucleotides in sol-gel derived silica
KAPUSUZ, DERYA; Durucan, Caner (2017-07-01)
The encapsulation mechanism of DNA in sol-gel derived silica has been explored in order to elucidate the effect of DNA conformation on encapsulation and to identify the nature of chemical/physical interaction of DNA with silica during and after sol-gel transition. In this respect, double stranded DNA and dAMP (2-deoxyadenosine 5-monophosphate) were encapsulated in silica using an alkoxide-based sol-gel route. Biomolecule-encapsulating gels have been characterized using UV-Vis, Si-29 NMR, FTIR spectroscopy a...
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
Sahin, Alper; ANIL, DUYGU (2017-02-01)
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of three test lengths (10, 20, and 30 items) and nine different sample sizes (150, 250, 350, 500, 750, 1,000, 2,000, 3,000 and 5,000 examinees). These data sets were the...
Engineering of alcohol dehydrogenase 2 hybrid-promoter architectures in Pichia pastoris to enhance recombinant protein expression on ethanol
Ergun, Burcu Gunduz; Gasser, Brigitte; Mattanovich, Diethard; Çalık, Pınar (2019-07-09)
The aim of this work is to increase recombinant protein expression in Pichia pastoris over the ethanol utilization pathway under novel-engineered promoter variants (NEPVs) of alcohol dehydrogenase 2 promoter (P-ADH2) through the generation of novel regulatory circuits. The NEPVs were designed by engineering of transcription factor binding sites (TFBSs) determined by in silico analyses and manual curation systematically, by (a) single-handedly replacement of specified TFBSs with synthetic motifs for Mxr1, Ca...
The use of solid phase microextraction for metabolomic analysis of non-small cell lung carcinoma cell line (A549) after administration of combretastatin A4
Jaroch, Karol; Boyacı, Ezel; Pawliszyn, Janusz; Bojko, Barbara (Springer Science and Business Media LLC, 2019-01-23)
Use of solid phase microextraction (SPME) for cell culture metabolomic analysis allows for the attainment of more sophisticated data from in vitro cell cultures. Moreover, considering that SPME allows the implementation of multiple extractions from the same sample due to its non/low-depletive nature, time course studies using the same set of samples are thus facilitated via this method. Such an approach results in a reduction in the number of samples needed for analysis thus eliminates inter-batch variabili...
Citation Formats
A. G. YAMAN and T. Can, “The effect of representative training dataset selection on the classification performance of the promoter sequences,” 2011, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/41800.