INTEGRATION OF MACHINE LEARNING AND ENTROPY METHODS FOR POST-GENOME-WIDE ASSOCIATION STUDIES ANALYSIS

2022-8-31
Yaldız, Burcu
Non-linear relationships between genotypes play an essential role in understanding the genetic interactions of complex disease traits. Genome-Wide Association Studies (GWAS) have revealed a statistical association between the SNPs in many complex diseases. As GWAS results could not thoroughly explain the genetic background of these disorders, Genome-Wide Interaction Studies started to gain importance. In recent years, various statistical approaches such as entropy-based methods have been suggested for revealing these non-additive interactions between variants. This study integrates an entropy-based 3-way interaction information method and machine learning (PLINK-Random Forest-Random Forest) workflow to capture the hidden patterns resulting from non-linear relationships between genotypes in Late-Onset Alzheimer’s Disease (LOAD) to discover early and differential diagnosis markers. We have optimized an entropy-based approach that detects the third-order interactions in PLINK-RF-RF models from three different LOAD datasets. A reduced SNP set was selected for all three datasets by 3WII analysis of PLINK-RF-RF prioritized SNPs, promising a model minimization approach. Selected triplets of SNPs that show significant differences between case and control groups in terms of 3WII are proposed as candidate biomarkers for a genotyping-based LOAD diagnosis. Among SNPs prioritized by 3WII, four out of 19 SNPs from GenADA, one out of 27 from ADNI, and four out of 106 NCRAD are mapped to genes directly associated with Alzheimer’s Disease. For the first time, we have integrated the RF-RF model with the entropy-based model for determining the three-way epistatic interactions for LOAD and discovered the common biological pathways for ADNI, GenADA, and NCRAD datasets.

Suggestions

Comparing Clustering Techniques for Real Microarray Data
Purutçuoğlu Gazi, Vilda (2012-08-29)
The clustering of genes detected as significant or differentially expressed provides useful information to biologists about functions and functional relationship of genes. There are variant types of clustering methods that can be applied in genomic data. These are mainly divided into the two groups, namely, hierarchical and partitional methods. In this paper, as the novelty, we perform a detailed clustering analysis for the recently collected boron microarray dataset to investigate biologically more interes...
Comprehensive Analyses of Gaussian Graphical Model under Different Biological Networks
Dokuzoglu, D.; Purutçuoğlu Gazi, Vilda (2017-09-01)
Naturally, genes interact with each other by forming a complicated network and the relationship between groups of genes can be shown by different functions as gene networks. Recently, there has been a growing concern in uncovering these complex structures from gene expression data by modeling them mathematically. The Gaussian graphical model is one of the very popular parametric approaches for modelling the underlying types of biochemical systems. In this study, we evaluate the performance of this probabili...
Inference of Gene Regulatory Networks Via Multiple Data Sources and a Recommendation Method
Ozsoy, Makbule Gulcin; Polat, Faruk; Alhajj, Reda (2015-11-12)
Gene regulatory networks (GRNs) are composed of biological components, including genes, proteins and metabolites, and their interactions. In general, computational methods are used to infer the connections among these components. However, computational methods should take into account the general features of the GRNs, which are sparseness, scale-free topology, modularity and structure of the inferred networks. In this work, observing the common aspects between recommendation systems and GRNs, we decided to ...
Scalable approach for effective control of gene regulatory networks
Tan, Mehmet; Alhajj, Reda; Polat, Faruk (Elsevier BV, 2010-01-01)
Objective: Interactions between genes are realized as gene regulatory networks (GRNs). The control of such networks is essential for investigating issues like different diseases. Control is the process of studying the states and behavior of a given system under different conditions. The system considered in this study is a gene regulatory network (GRN), and one of the most important aspects in the control of GRNs is scalability. Consequently, the objective of this study is to develop a scalable technique th...
Investigation of Multi-task Deep Neural Networks in Automated Protein Function Prediction
Rifaioğlu, Ahmet Süreyya; Martin, Maria Jesus; Atalay, Rengül; Atalay, Mehmet Volkan; Doğan, Tunca (2017-07-20)
Functional annotation of proteins is a crucial research field for understanding molecular mechanisms of living-beings and for biomedical purposes (e.g. identification of disease-causing functional changes in genes and for discovering novel drugs). Several Gene Ontology (GO) based protein function prediction methods have been proposed in the last decade to annotate proteins. However, considering the prediction performances of the proposed methods, it can be stated that there is still room for significant imp...
Citation Formats
B. Yaldız, “INTEGRATION OF MACHINE LEARNING AND ENTROPY METHODS FOR POST-GENOME-WIDE ASSOCIATION STUDIES ANALYSIS,” Ph.D. - Doctoral Program, Middle East Technical University, 2022.