Selection of representative SNP sets for genome-wide association studies: a metaheuristic approach

Ustunkar, Gurkan
Weber, Gerhard W.
Friedrich, Christoph M.
Aydın Son, Yeşim
After the completion of Human Genome Project in 2003, it is now possible to associate genetic variations in the human genome with common and complex diseases. The current challenge now is to utilize the genomic data efficiently and to develop tools to improve our understanding of etiology of complex diseases. Many of the algorithms needed to deal with this task were originally developed in management science and operations research (OR). One application is to select a subset of the Single Nucleotide Polymorphism (SNP) biomarkers from the whole SNP set that is informative and small enough for subsequent association studies. In this paper, we present an OR application for representative SNP selection that implements our novel Simulated Annealing (SA) based feature-selection algorithm. We hope that our work will facilitate reliable identification of SNPs that are involved in the etiology of complex diseases and ultimately support timely identification of genomic disease biomarkers and the development of personalized-medicine approaches and targeted drug discoveries.


Capturing Wheat Phenotypes at the Genome Level
Hussaın, Babar; et. al. (2022-07-04)
Recent technological advances in next-generation sequencing (NGS) technologies have dramatically reduced the cost of DNA sequencing, allowing species with large and complex genomes to be sequenced. Although bread wheat (Triticum aestivum L.) is one of the world’s most important food crops, efficient exploitation of molecular marker-assisted breeding approaches has lagged behind that achieved in other crop species, due to its large polyploid genome. However, an international public–private effort spanning 9 ...
Prediction of polyadenylation sites by probe level analysis of microarray data
İlgüner, Yiğit; Can, Tolga; Department of Computer Engineering (2013)
In general, identi fication of polyadenylation sites in 3' untranslated regions of genes is carried out by DNA sequencing. However, there is no direct high-throughput screen to detect the polyadenylation sites which are activated under particular circumstances or in certain tissues. Since microarray manufacturers usually overlook the alternative polyadenylation events when their microarrays are produced, certain design decisions of these microarrays can be used for detecting polyadenylation sites. In this t...
HENDEN, Şevki Onur; Can, Tolga; Department of Computer Engineering (2021-9-9)
Characterizing the human genome's molecular functions and their variations across people is vital for understanding the cellular processes behind human genetic characteristics and diseases. With the advent of single-cell RNA sequencing (scRNA-seq), it is now possible to investigate gene expression in individual cells. Although a number of scRNA-seq bioinformatics tools are now available, many of them focus on overall gene expression levels and, as a result, often ignore heterogeneity caused by individual tr...
Prediction of protein subcellular localization using global protein sequence feature
Bozkurt, Burçin; Atalay, Mehmet Volkan; Department of Computer Engineering (2003)
The problem of identifying genes in eukaryotic genomic sequences by computational methods has attracted considerable research attention in recent years. Many early approaches to the problem focused on prediction of individual functional elements and compositional properties of coding and non coding deoxyribonucleic acid (DNA) in entire eukaryotic gene structures. More recently, a number of approaches has been developed which integrate multiple types of information including structure, function and genetic p...
Mining microarray data for biologically important gene sets
Korkmaz, Gülberal Kırçiçeği Yoksul; Atalay, Mehmet Volkan; Department of Computer Engineering (2012)
Microarray technology enables researchers to measure the expression levels of thousands of genes simultaneously to understand relationships between genes, extract pathways, and in general understand a diverse amount of biological processes such as diseases and cell cycles. While microarrays provide the great opportunity of revealing information about biological processes, it is a challenging task to mine the huge amount of information contained in the microarray datasets. Generally, since an accurate model ...
Citation Formats
G. Ustunkar, S. AKYÜZ, G. W. Weber, C. M. Friedrich, and Y. Aydın Son, “Selection of representative SNP sets for genome-wide association studies: a metaheuristic approach,” OPTIMIZATION LETTERS, pp. 1207–1218, 2012, Accessed: 00, 2020. [Online]. Available: