Determination of the effect of polyadenylation SLR values on microarray data classification

Download
2014
Aslan, Ümit
Microarray data classification is generally used to predict unknown sample outcomes by the help of models created using the preprocessed and categorized microarray data that includes gene expression values. Preparation of microarray experiments, design of Affymetrix chips and availability of previous microarray experiments give the opportunity to extract a new kind of data; differential expressions of proximal and distal probes (Short to Long Ratio -SLR- values), which is used to predict the alternative polyadenylation (APA) events. In this thesis, we aim to integrate gene expression data and these SLR values and then determine how the microarray data classification is affected after this integration process. Because of the filtering operations applied while predicting the APA events, SLR values are not available for all the probe sets on a microarray sample. These missing values are not left out not only while integrating the data, but also while applying the classification techniques. Three types of classification techniques, Support Vector Machines (SVM), Decision Tree (J48) and Random Forest are applied to primary breast tumor microarray data before and after integration of gene expression values with SLR values and the classification accuracies of metastasis are found out. The results show that; APA events have incontrovertible impact on gene expression classifications and mostly towards improvement of accuracies.

Suggestions

A Comparative Study of Statistical and Artificial Intelligence based Classification Algorithms on Central Nervous System Cancer Microarray Gene Expression Data
Arslan, Mustafa Turan; Kalınlı, Adem (2016-09-03)
A variety of methods are used in order to classify cancer gene expression profiles based on microarray data. Especially, statistical methods such as Support Vector Machines (SVM), Decision Trees (DT) and Bayes are widely preferred to classify on microarray cancer data. However, the statistical methods can often be inadequate to solve problems which are based on particularly large-scale data such as DNA microarray data. Therefore, artificial intelligence-based methods have been used to classify on microarray...
Investigation and comparison of the preprocessing algorithms for microarrayanalysis for robust gene expression calculation and performance analysis of technical replicates
İLK, HAKKI GÖKHAN; İlk Dağ, Özlem; KONU KARAKAYALI, ÖZLEN; ÖZDAĞ, Hilal (2006-04-19)
Preprocessing of microarray data involves the necessary steps of background correction, normalization and summarization of the raw intensity data obtained from cDNA or oligo-arrays before statistical analysis. Several algorithms, namely RMA, dChip, and MAS5 exist for the preprocessing of Affymetrix microarray data. Previous studies have identified RMA as one of most accurate algorithms while MAS5 was characterized with lower accuracy and sensitivity levels. In this study, performance of different preprocess...
Short Time Series Microarray Data Analysis and Biological Annotation
Sökmen, Zerrin; Atalay, Mehmet Volkan; Atalay, Rengül (2008-01-01)
Significant gene list is the result of microarray data analysis should be explained for the purpose of biological functions. The aim of this study is to extract the biologically related gene clusters over the short time series microarray gene data by applying unsupervised methods and automatically perform biological annotation of those clusters. In the first step of the study, short time series microarray expression data is clustered according to similar expression profiles. After that, several biological d...
A computational approach to nonparametric regression: bootstrapping CMARS method
Yazici, Ceyda; Yerlikaya-Ozkurt, Fatma; Batmaz, İnci (2015-10-01)
Bootstrapping is a computer-intensive statistical method which treats the data set as a population and draws samples from it with replacement. This resampling method has wide application areas especially in mathematically intractable problems. In this study, it is used to obtain the empirical distributions of the parameters to determine whether they are statistically significant or not in a special case of nonparametric regression, conic multivariate adaptive regression splines (CMARS), a statistical machin...
Derivation of Transcriptional Regulatory Relationships by Partial Least Squares Regression
Tan, Mehmet; Polat, Faruk; Alhajj, Reda (2009-11-04)
As the number of genes in a transcriptional regulatory network is large and the number of samples in biological data types is usually small, there is a need for integrating multiple data types for reverse engineering these networks. In this paper, we propose a method to integrate microarray gene expression, ChIP-chip and transcription factor binding motif data sets in a partial least squares regression model to derive transcription factors (TFs) gene interactions. Both single and synergistic effects of TFs ...
Citation Formats
Ü. Aslan, “Determination of the effect of polyadenylation SLR values on microarray data classification,” M.S. - Master of Science, Middle East Technical University, 2014.