Prediction of polyadenylation sites by probe level analysis of microarray data

İlgüner, Yiğit
In general, identi fication of polyadenylation sites in 3' untranslated regions of genes is carried out by DNA sequencing. However, there is no direct high-throughput screen to detect the polyadenylation sites which are activated under particular circumstances or in certain tissues. Since microarray manufacturers usually overlook the alternative polyadenylation events when their microarrays are produced, certain design decisions of these microarrays can be used for detecting polyadenylation sites. In this thesis, we introduce a method and a corresponding tool which investigates the hybridization levels of individual probes in a probe set of a transcript to identify differential expression of two subsets of probes to the upstream and downstream of a known polyadenylation site, respectively. For the identi fication of the putative polyadenylation sites, we also introduce a new method that is not based on sequence information. This technique analyzes the differential expression of every possible proximal/distal grouping in a probe set and detects statistically signi ficant variations between groups. Such a variation is an indicator of a putative polyadenylation site in between the last nucleotide of the probe sequence of the proximal subset and the fi rst nucleotide of the probe sequence of the distal subset. We apply our method to several microarray samples that are manufactured under different conditions. We discuss the performance of our method on these datasets. Our results show that we are able to detect polyadenylation sites that are not in common polyadenylation databases but veri fied by biological experiments.
