Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Investigation of the impacts of linkage disequilibrium on SNP selection studies
Download
index.pdf
Date
2015
Author
Kantar Özçırpan, Ekin
Metadata
Show full item record
Item Usage Stats
254
views
71
downloads
Cite This
In many Genome Wide Association Studies (GWAS), the relation between SNPs and complex diseases has being tried to reveal. Moreover it is known that, in GWAS there exist a high amount of data which include relations between SNPs, phenotypes and diseases, etc. Many algorithms have been used to be able to reach the desired information from this huge data. Therefore, in this study, an algorithm one of whose important steps is based on linkage disequilibrium(LD), was constructed to eliminate the redundant information from the high-dimensional data. The algorithm improved in this study has been tested on prostate cancer data set downloaded from dbGaP. In order to find disease related SNPs in GWAS in a more effective way, we have constructed an algorithm which is based on LD. The web tool called SNAP (SNP Annotation and Proxy Search) was used to obtain the SNPs in the region of LD, which was determined based on the specific threshold value for {u1D45F}2. This value was selected as 0.5. After obtaining a modified version of original data set based on LD, Using Fisher’s Combination Method, we have obtained associated combined p values for each SNP in this data set. Then using SNPnexus database, we tried to achieve disease related SNPs from both data sets which are the original and modified ones. Thus both of the performances being applied on these data sets were evaluated relative to each other. Moreover, after eliminating the redundant data we have applied SNPnexus analysis again and then the results have shown us, by using approximately half of the SNPs, we were able to achieve the desired genes. Besides all of them also random forest algorithm was performed on the data set including SNPs with individual p values and the modified data set which is including SNPs with combined p values. The outputs of both performances were compared. In addition, one more purpose of this study, being able to reach the most important regulatory SNPs (rSNPs) in GWAS. Based on the data set which was modified using LD, we have focused on the non-coding SNPs, which are located on noncoding regions, through the whole genome. In conclusion, the number of important regulatory SNPs that were found from the modified data set, is much higher than we have found before by using original data set., it is expected from this thesis is that, the studies which have been conducted on prioritization of disease related SNPs are being effected by linkage disequilibrium(LD).
Subject Keywords
Genomes.
,
Prostate
,
Genetic polymorphisms.
,
Single nucleotide polymorphisms.
URI
http://etd.lib.metu.edu.tr/upload/12618418/index.pdf
https://hdl.handle.net/11511/24385
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
A multi-layered graphical model of the relation among SNPS, GENES, and pathways based on subgraph search
Ersoy, Gökhan; Aydın Son, Yeşim; Can, Tolga; Department of Bioinformatics (2015)
The analysis of Single Nucleotide Polymorphisms (SNPs) through Genome Wide Association Studies (GWAS) presents great potential for describing disease loci and gaining insight into the underlying etiology of diseases. Recently described combined p-value approach allows identification of associations at gene and pathway level. The integrated programs like METU-SNP produce simple lists of either SNP id/gene id/pathway title and their p-values and significance status or SNP id/disease id/pathway information. In...
Systems-level analysis of genome wide association study results for a pilot juvenile idiopathic arthritis family study
Aydın Son, Yeşim; Demirkaya, Erkan; BİLGİNER, YELDA; KASAPÇOPUR, Özgür; Unsal, Erbil; ALİKAŞİFOĞLU, MEHMET; ÖZEN, SEZA (2015-01-01)
Genome wide association studies (GWAS) determine susceptibility profiles for complex diseases. In this study, GWAS was performed in 26 patients with oligo and rheumatoid factor negative polyarticular juvenile idiopathic artritis (JIA) and their healthy parents by Affymetrix 250K SNP arrays. Biological function and pathway enrichment analysis was done. This is the first GWAS reported for JIA families from the eastern Mediterranean population. Enrichment of Fc gamma R-mediated phagocytosis pathway and respons...
A Hybrid feature selection model for genome wide association studies
Yücebaş, Sait Can; Baykal, Nazife; Aydın Son, Yeşim; Department of Health Informatics (2013)
Through Genome Wide Association Studies (GWAS) many SNP-complex disease relations have been investigated so far. GWAS presents high amount – high dimensional data and relations between SNPs, phenotypes and diseases are most likely to be nonlinear. In order to handle high volume-high dimensional data and to be able to find the nonlinear relations, data mining approaches are needed. A hybrid feature selection model of support vector machine and decision tree has been designed. This model also combines the gen...
A powerful approach for effective finding of significantly differentially expressed genes
Abul, Osman; Alhajj, Reda; Polat, Faruk (Institute of Electrical and Electronics Engineers (IEEE), 2006-07-01)
The problem of identifying significantly differentially expressed genes for replicated microarray experiments is accepted as significant and has been tackled by several researchers. Patterns from Gene Expression (PaGE) and q-values are two of the well-known approaches developed to handle this problem. This paper proposes a powerful approach to handle this problem. We first propose a method for estimating the prior probabilities used in the first version of the PaGE algorithm. This way, the problem definitio...
Association of-174G/C interleukin-6 gene polymorphism with the risk of chronic lymphocytic, chronic myelogenous and acute myelogenous leukemias in Turkish patients
Mutlu, Pelin; Elci, Pinar; Yildirim, Murat; Cetin, A. Turker; Avcu, Ferit (2014-07-01)
Purpose: The purpose of this study was to evaluate the relationship between -174G/C interleukin-6 (IL-6) gene promoter polymorphism and susceptibility to chronic lymphocytic (CLL), chronic myelogenous (CML) and acute myelogenous leukemia (AML) in Turkish patients.
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
E. Kantar Özçırpan, “Investigation of the impacts of linkage disequilibrium on SNP selection studies,” M.S. - Master of Science, Middle East Technical University, 2015.