An integrative approach to structured snp prioritization and representative snp selection for genome-wide association studies

Download

index.pdf

Date

2011

Author

Üstünkar, Gürkan

Metadata

Show full item record

Item Usage Stats

280
views

143
downloads

Single Nucleotide Polymorphisms (SNPs) are the most frequent genomic variations and the main basis for genetic differences among individuals and many diseases. As genotyping millions of SNPs at once is now possible with the microarrays and advanced sequencing technologies, SNPs are becoming more popular as genomic biomarkers. Like other high-throughput research techniques, genome wide association studies (GWAS) of SNPs usually hit a bottleneck after statistical analysis of significantly associated SNPs, as there is no standardized approach to prioritize SNPs or to select representative SNPs that show association with the conditions under study. In this study, a java based integrated system that makes use of major public databases to prioritize SNPs according to their biological relevance and statistical significance has been constructed. The Analytic Hierarchy Process, has been utilized for objective prioritization of SNPs and a new emerging methodology for second-wave analysis of genes and pathways related to disease associated SNPs based on a combined p-value approach is applied into the prioritization scheme. Using the subset of SNPs that is most representative of all SNPs associated with the diseases reduces the required computational power for analysis and decreases cost of following association and biomarker discovery studies. In addition to the proposed prioritization system, we have developed a novel feature selection method based on Simulated Annealing (SA) for representative SNP selection. The validity and accuracy of developed model has been tested on real life case control data set and produced biologically meaningful results. The integrated desktop application developed in our study will facilitate reliable identification of SNPs that are involved in the etiology of complex diseases, ultimately supporting timely identification of genomic disease biomarkers, and development of personalized medicine approaches and targeted drug discoveries.

Subject Keywords

Computer software., Bioinformatics., Genetics

URI

http://etd.lib.metu.edu.tr/upload/12612871/index.pdf
https://hdl.handle.net/11511/20500

Collections

Graduate School of Informatics, Thesis

Suggestions

OpenMETU
Core

Identification and analysis of genomic regions with large between-population differentiation in humans Myles, S.; Tang, K.; Somel, Mehmet; Green, R. E.; Kelso, J.; Stoneking, M. (Wiley, 2008-01-01) The primary aim of genetic association and linkage studies is to identify genetic variants that contribute to phenotypic variation within human populations. Since the overwhelming majority of human genetic variation is found within populations, these methods are expected to be effective and can likely be extrapolated from one human population to another. However, they may lack power in detecting the genetic variants that contribute to phenotypes that differ greatly between human populations. Phenotypes that...
Functional characterization of microrna-125b expression in MCF7 breast cancer cell line Tuna, Serkan; Erson Bensan, Ayşe Elif; Department of Biology (2010) microRNA dependent gene expression regulation has roles in diverse processes such as differentiation, proliferation and apoptosis. Therefore, deregulated miRNA expression has functional importance for various diseases, including cancer. miR-125b is among the commonly downregulated miRNAs in breast cancer cells . Therefore we aimed to characterize the effects of miR-125b expression in MCF7 breast cancer cell line (BCCL) to better understand its roles in tumorigenesis. Here, we investigated mir-125 family mem...
A clustering method for the problem of protein subcellular localization Bezek, Perit; Atalay, Mehmet Volkan; Department of Computer Engineering (2006) In this study, the focus is on predicting the subcellular localization of a protein, since subcellular localization is helpful in understanding a protein’s functions. Function of a protein may be estimated from its sequence. Motifs or conserved subsequences are strong indicators of function. In a given sample set of protein sequences known to perform the same function, a certain subsequence or group of subsequences should be common; that is, occurrence (frequency) of common subsequences should be high. Our ...
A classification system for the problem of protein subcellular localization Alay, Gökçen; Atalay, Mehmet Volkan; Department of Computer Engineering (2007) The focus of this study is on predicting the subcellular localization of a protein. Subcellular localization information is important for protein function annotation which is a fundamental problem in computational biology. For this problem, a classification system is built that has two main parts: a predictor that is based on a feature mapping technique to extract biologically meaningful information from protein sequences and a client/server architecture for searching and predicting subcellular localization...
Development of a genetic material transfer approach for gene therapy Ayaz, Şerife; Hasırcı, Vasıf Nejat; Department of Biotechnology (2005) This thesis is focused on the development of a gene delivery system, especially for the purpose of DNA vaccination. DNA expression vectors have the potential to be useful therapeutics for a wide variety of applications. A carrier system was designed to realize the delivery of genes to cells and the promotion of controlled adequate expression in the target cells. The low gene delivery efficiency observed with systems composed of polyplexes is mainly due to low stability of polycation e.g polyethylenimine-DNA...

Citation Formats

G. Üstünkar, “An integrative approach to structured snp prioritization and representative snp selection for genome-wide association studies,” Ph.D. - Doctoral Program, Middle East Technical University, 2011.