Prediction of polyadenylation sites by probe level analysis of microarray data

Download
2013
İlgüner, Yiğit
In general, identi fication of polyadenylation sites in 3' untranslated regions of genes is carried out by DNA sequencing. However, there is no direct high-throughput screen to detect the polyadenylation sites which are activated under particular circumstances or in certain tissues. Since microarray manufacturers usually overlook the alternative polyadenylation events when their microarrays are produced, certain design decisions of these microarrays can be used for detecting polyadenylation sites. In this thesis, we introduce a method and a corresponding tool which investigates the hybridization levels of individual probes in a probe set of a transcript to identify differential expression of two subsets of probes to the upstream and downstream of a known polyadenylation site, respectively. For the identi fication of the putative polyadenylation sites, we also introduce a new method that is not based on sequence information. This technique analyzes the differential expression of every possible proximal/distal grouping in a probe set and detects statistically signi ficant variations between groups. Such a variation is an indicator of a putative polyadenylation site in between the last nucleotide of the probe sequence of the proximal subset and the fi rst nucleotide of the probe sequence of the distal subset. We apply our method to several microarray samples that are manufactured under different conditions. We discuss the performance of our method on these datasets. Our results show that we are able to detect polyadenylation sites that are not in common polyadenylation databases but veri fied by biological experiments.

Suggestions

Prediction of protein subcellular localization based on primary sequence data
Ozarar, M; Atalay, Mehmet Volkan; Atalay, Rengül (2003-01-01)
This paper describes a system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order. Our approach for prediction is to find the most frequent motifs for each protein (class) based on clustering and then to use these most frequent motifs as features for classification. This approach allows a classification independent of the length of the sequence. ...
Prediction of protein subcellular localization based on primary sequence data
Özarar, Mert; Atalay, Mehmet Volkan; Department of Computer Engineering (2003)
Subcellular localization is crucial for determining the functions of proteins. A system called prediction of protein subcellular localization (P2SL) that predicts the subcellular localization of proteins in eukaryotic organisms based on the amino acid content of primary sequences using amino acid order is designed. The approach for prediction is to nd the most frequent motifs for each protein in a given class based on clustering via self organizing maps and then to use these most frequent motifs as features...
Prediction of protein subcellular localization using global protein sequence feature
Bozkurt, Burçin; Atalay, Mehmet Volkan; Department of Computer Engineering (2003)
The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years. Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic p...
An integrative approach to structured snp prioritization and representative snp selection for genome-wide association studies
Üstünkar, Gürkan; Aydın Son, Yeşim; Weber, Gerhard Wilhelm; Department of Information Systems (2011)
Single Nucleotide Polymorphisms (SNPs) are the most frequent genomic variations and the main basis for genetic differences among individuals and many diseases. As genotyping millions of SNPs at once is now possible with the microarrays and advanced sequencing technologies, SNPs are becoming more popular as genomic biomarkers. Like other high-throughput research techniques, genome wide association studies (GWAS) of SNPs usually hit a bottleneck after statistical analysis of significantly associated SNPs, as ...
Using Adaptive Neuro-Fuzzy Inference System for Classification of Microarray Gene Expression Cancer Profiles
Haznedar, Bülent; Arslan, Mustafa Turan; Kalınlı, Adem (2018-05-01)
Microarray is a technology that enables simultaneously analysis of thousands of genes in DNA structure depending on the advances in biochemistry. With this technology, it has become possible to diagnose and treat heredity diseases by analyzing thousands of gene expression levels. This study proposes an artificial intelligence method, Adaptive neuro-fuzzy inference system (ANFIS), to classify cancer gene expression profiles. The findings obtained with the proposed ANFIS approach are compared with the results...
Citation Formats
Y. İlgüner, “Prediction of polyadenylation sites by probe level analysis of microarray data,” M.S. - Master of Science, Middle East Technical University, 2013.