Investigation of the impacts of linkage disequilibrium on SNP selection studies

Download
2015
Kantar Özçırpan, Ekin
In many Genome Wide Association Studies (GWAS), the relation between SNPs and complex diseases has being tried to reveal. Moreover it is known that, in GWAS there exist a high amount of data which include relations between SNPs, phenotypes and diseases, etc. Many algorithms have been used to be able to reach the desired information from this huge data. Therefore, in this study, an algorithm one of whose important steps is based on linkage disequilibrium(LD), was constructed to eliminate the redundant information from the high-dimensional data. The algorithm improved in this study has been tested on prostate cancer data set downloaded from dbGaP. In order to find disease related SNPs in GWAS in a more effective way, we have constructed an algorithm which is based on LD. The web tool called SNAP (SNP Annotation and Proxy Search) was used to obtain the SNPs in the region of LD, which was determined based on the specific threshold value for {u1D45F}2. This value was selected as 0.5. After obtaining a modified version of original data set based on LD, Using Fisher’s Combination Method, we have obtained associated combined p values for each SNP in this data set. Then using SNPnexus database, we tried to achieve disease related SNPs from both data sets which are the original and modified ones. Thus both of the performances being applied on these data sets were evaluated relative to each other. Moreover, after eliminating the redundant data we have applied SNPnexus analysis again and then the results have shown us, by using approximately half of the SNPs, we were able to achieve the desired genes. Besides all of them also random forest algorithm was performed on the data set including SNPs with individual p values and the modified data set which is including SNPs with combined p values. The outputs of both performances were compared. In addition, one more purpose of this study, being able to reach the most important regulatory SNPs (rSNPs) in GWAS. Based on the data set which was modified using LD, we have focused on the non-coding SNPs, which are located on noncoding regions, through the whole genome. In conclusion, the number of important regulatory SNPs that were found from the modified data set, is much higher than we have found before by using original data set., it is expected from this thesis is that, the studies which have been conducted on prioritization of disease related SNPs are being effected by linkage disequilibrium(LD).

Suggestions

A multi-layered graphical model of the relation among SNPS, GENES, and pathways based on subgraph search
Ersoy, Gökhan; Aydın Son, Yeşim; Can, Tolga; Department of Bioinformatics (2015)
The analysis of Single Nucleotide Polymorphisms (SNPs) through Genome Wide Association Studies (GWAS) presents great potential for describing disease loci and gaining insight into the underlying etiology of diseases. Recently described combined p-value approach allows identification of associations at gene and pathway level. The integrated programs like METU-SNP produce simple lists of either SNP id/gene id/pathway title and their p-values and significance status or SNP id/disease id/pathway information. In...
Systems-level analysis of genome wide association study results for a pilot juvenile idiopathic arthritis family study
Aydın Son, Yeşim; Demirkaya, Erkan; BİLGİNER, YELDA; KASAPÇOPUR, Özgür; Unsal, Erbil; ALİKAŞİFOĞLU, MEHMET; ÖZEN, SEZA (2015-01-01)
Genome wide association studies (GWAS) determine susceptibility profiles for complex diseases. In this study, GWAS was performed in 26 patients with oligo and rheumatoid factor negative polyarticular juvenile idiopathic artritis (JIA) and their healthy parents by Affymetrix 250K SNP arrays. Biological function and pathway enrichment analysis was done. This is the first GWAS reported for JIA families from the eastern Mediterranean population. Enrichment of Fc gamma R-mediated phagocytosis pathway and respons...
Dna repair genes, xrcc3 and rad51, polymorphisms and risk of childhood acute lymphoblastic leukemia
Tanrıkut, Cihan; Arınç, Emel; Çoruh, Nursen; Department of Biochemistry (2010)
In this study, the role of two DNA repair genes, X-ray repair cross complementing group 3 (XRCC3) Thr241Met and Rad51 G135C polymorphisms were investigated in the risk of development of childhood ALL in Turkish population among 193 healthy controls and 184 ALL patients, by using PCR-RFLP technique. For XRCC3 Thr241Met polymorphism, the frequencies of both heterozygous and homozygous mutant genotypes were found to be higher in the controls compared to ALL patients (OR: 0.59, p = 0.02; OR: 0.48, p = 0.02, res...
A Hybrid feature selection model for genome wide association studies
Yücebaş, Sait Can; Baykal, Nazife; Aydın Son, Yeşim; Department of Health Informatics (2013)
Through Genome Wide Association Studies (GWAS) many SNP-complex disease relations have been investigated so far. GWAS presents high amount – high dimensional data and relations between SNPs, phenotypes and diseases are most likely to be nonlinear. In order to handle high volume-high dimensional data and to be able to find the nonlinear relations, data mining approaches are needed. A hybrid feature selection model of support vector machine and decision tree has been designed. This model also combines the gen...
A powerful approach for effective finding of significantly differentially expressed genes
Abul, Osman; Alhajj, Reda; Polat, Faruk (Institute of Electrical and Electronics Engineers (IEEE), 2006-07-01)
The problem of identifying significantly differentially expressed genes for replicated microarray experiments is accepted as significant and has been tackled by several researchers. Patterns from Gene Expression (PaGE) and q-values are two of the well-known approaches developed to handle this problem. This paper proposes a powerful approach to handle this problem. We first propose a method for estimating the prior probabilities used in the first version of the PaGE algorithm. This way, the problem definitio...
Citation Formats
E. Kantar Özçırpan, “Investigation of the impacts of linkage disequilibrium on SNP selection studies,” M.S. - Master of Science, Middle East Technical University, 2015.