Automated biological data acquisition and integration using machine learning techniques

Çarkacıoğlu, Levent
Since the initial genome sequencing projects along with the recent advances on technology, molecular biology and large scale transcriptome analysis result in data accumulation at a large scale. These data have been provided in different platforms and come from different laboratories therefore, there is a need for compilation and comprehensive analysis. In this thesis, we addressed the automatization of biological data acquisition and integration from these non-uniform data using machine learning techniques. We focused on two different mining studies in the scope of this thesis. In the first study, we worked on characterizing expression patterns of housekeeping genes. We described methodologies to compare measures of housekeeping genes with non-housekeeping genes. In the second study, we proposed a novel framework, bi-k-bi clustering, for finding association rules of gene pairs that can easily operate on large scale and multiple heterogeneous data sets. Results in both studies showed consistency and relatedness with the available literature. Furthermore, our results provided some novel insights waiting to be experimented by the biologists.


Computational approaches leveraging integrated connections of multi-omic data toward clinical applications
Demirel, Habibe Cansu; Tunçbağ, Nurcan (2021-10-01)
In line with the advances in high-throughput technologies, multiple omic datasets have accumulated to study biological systems and diseases coherently. No single omics data type is capable of fully representing cellular activity. The complexity of the biological processes arises from the interactions between omic entities such as genes, proteins, and metabolites. Therefore, multi-omic data integration is crucial but challenging. The impact of the molecular alterations in multi-omic data is not local in the ...
Automated learning rate search using batch-level cross-validation
KABAKÇI, Duygu; Akbaş, Emre (2021-04-01)
Deep learning researchers and practitioners have accumulated a significant amount of experience on training a wide variety of architectures on various datasets. However, given anetwork architecture and a dataset, obtaining the best model (i.e. the model giving the smallest test set error) while keeping the training time complexity low is still a challenging task. Hyper-parameters of deep neural networks, especially the learning rate and its (decay) schedule, highly affect the network's final performance. Th...
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
Effective gene expression data generation framework based on multi-model approach
Sirin, Utku; Erdogdu, Utku; Polat, Faruk; TAN, MEHMET; Alhajj, Reda (Elsevier BV, 2016-06-01)
Objective: Overcome the lack of enough samples in gene expression data sets having thousands of genes but a small number of samples challenging the computational methods using them.
Using Adaptive Neuro-Fuzzy Inference System for Classification of Microarray Gene Expression Cancer Profiles
Haznedar, Bülent; Arslan, Mustafa Turan; Kalınlı, Adem (2018-05-01)
Microarray is a technology that enables simultaneously analysis of thousands of genes in DNA structure depending on the advances in biochemistry. With this technology, it has become possible to diagnose and treat heredity diseases by analyzing thousands of gene expression levels. This study proposes an artificial intelligence method, Adaptive neuro-fuzzy inference system (ANFIS), to classify cancer gene expression profiles. The findings obtained with the proposed ANFIS approach are compared with the results...
Citation Formats
L. Çarkacıoğlu, “Automated biological data acquisition and integration using machine learning techniques,” Ph.D. - Doctoral Program, Middle East Technical University, 2009.