Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
A Hybrid feature selection model for genome wide association studies
Download
index.pdf
Date
2013
Author
Yücebaş, Sait Can
Metadata
Show full item record
Item Usage Stats
246
views
82
downloads
Cite This
Through Genome Wide Association Studies (GWAS) many SNP-complex disease relations have been investigated so far. GWAS presents high amount – high dimensional data and relations between SNPs, phenotypes and diseases are most likely to be nonlinear. In order to handle high volume-high dimensional data and to be able to find the nonlinear relations, data mining approaches are needed. A hybrid feature selection model of support vector machine and decision tree has been designed. This model also combines the genotype and phenotype information to increase the diagnostic performance. The model is tested on prostate cancer and melanoma data that have been downloaded from NCBI’s dbGaP database. On prostate cancer data the hybrid system performed 71.67% accuracy on data set consists of only genotypes, 84.23% accuracy on data set consists of only phenotypes and when genotyping and phenotypes are integrated accuracy increased to 93.81%. On melanoma data, the hybrid system performed 57.12% accuracy for only genotypes, 75.48% accuracy for only phenotypes and when genotyping and phenotypes are integrated accuracy increased to 86.35%. For prostate cancer case the hybrid system’s has performance indicators of 90.92% of sensitivity and 0.91 AUC, which outperforms Prostate Specific Antigen (PSA) test. In melanoma case selected phenotypic and genotypic features were also examined by previous studies that shows the ability of the system to select most predictive features so the hybrid system on melanoma case has a potential to be used for identifying the risk groups.
Subject Keywords
Genomes.
,
Prostate
,
Melanoma.
,
Decision trees.
,
Medical informatics.
URI
http://etd.lib.metu.edu.tr/upload/12616393/index.pdf
https://hdl.handle.net/11511/22859
Collections
Graduate School of Informatics, Thesis
Suggestions
OpenMETU
Core
A Prostate Cancer Model Build by a Novel SVM-ID3 Hybrid Feature Selection Method Using Both Genotyping and Phenotype Data from dbGaP
Yucebas, Sait Can; Aydın Son, Yeşim (2014-03-20)
Through Genome Wide Association Studies (GWAS) many Single Nucleotide Polymorphism (SNP)-complex disease relations can be investigated. The output of GWAS can be high in amount and high dimensional, also relations between SNPs, phenotypes and diseases are most likely to be nonlinear. In order to handle high volume-high dimensional data and to be able to find the nonlinear relations we have utilized data mining approaches and a hybrid feature selection model of support vector machine and decision tree has be...
Investigation of the impacts of linkage disequilibrium on SNP selection studies
Kantar Özçırpan, Ekin; Weber, Gerhard Wilhelm; İyigün, Cem; Department of Biomedical Engineering (2015)
In many Genome Wide Association Studies (GWAS), the relation between SNPs and complex diseases has being tried to reveal. Moreover it is known that, in GWAS there exist a high amount of data which include relations between SNPs, phenotypes and diseases, etc. Many algorithms have been used to be able to reach the desired information from this huge data. Therefore, in this study, an algorithm one of whose important steps is based on linkage disequilibrium(LD), was constructed to eliminate the redundant inform...
A Multi-layer model for privacy preserving policy making for disclosure of public health data
Alizadeh Mizani, Mehrdad; Baykal, Nazife; Department of Medical Informatics (2013)
Health organizations in Turkey collect ever-increasing amount of individual data are valuable source of information for public health research. However, due to privacy risks, they publish data in aggregated rather than individual forms. The lack of standardized policies regarding secondary uses of health data leads to ineffectiveness of available technical methods. As a result, access to and utilization of person-specific datasets by public health researchers become extremely cumbersome. The bias introduced...
A powerful approach for effective finding of significantly differentially expressed genes
Abul, Osman; Alhajj, Reda; Polat, Faruk (Institute of Electrical and Electronics Engineers (IEEE), 2006-07-01)
The problem of identifying significantly differentially expressed genes for replicated microarray experiments is accepted as significant and has been tackled by several researchers. Patterns from Gene Expression (PaGE) and q-values are two of the well-known approaches developed to handle this problem. This paper proposes a powerful approach to handle this problem. We first propose a method for estimating the prior probabilities used in the first version of the PaGE algorithm. This way, the problem definitio...
A multi-layered graphical model of the relation among SNPS, GENES, and pathways based on subgraph search
Ersoy, Gökhan; Aydın Son, Yeşim; Can, Tolga; Department of Bioinformatics (2015)
The analysis of Single Nucleotide Polymorphisms (SNPs) through Genome Wide Association Studies (GWAS) presents great potential for describing disease loci and gaining insight into the underlying etiology of diseases. Recently described combined p-value approach allows identification of associations at gene and pathway level. The integrated programs like METU-SNP produce simple lists of either SNP id/gene id/pathway title and their p-values and significance status or SNP id/disease id/pathway information. In...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
S. C. Yücebaş, “A Hybrid feature selection model for genome wide association studies,” Ph.D. - Doctoral Program, Middle East Technical University, 2013.