Gene function inference from expression using probabilistic topic models

Tercan, Bahar
The main aim of this study is to develop a probabilistic biclustering approach which can help to elaborate on the question "Can we determine the biological context of a sample (tissue/condition etc.) using expression data and associate the contexts with annotation databases like Gene Ontology, KEGG and HUGE to discover annotations (like cell division, metabolic process, illness etc.) for these contexts?". We applied a nonparametric probabilistic topic model, Hierarchical Dirichlet Process (HDP), which was originally developed for text mining to extract unknown number of latent topics from documents, to gene expression data analysis. In this study, the analogy is the mRNA transcript to the word, the biological context to the topic and the sample to the document. This study builds on previous studies that have, to varying extents, been able to apply topic models to the problem of differential expression, and improves on the current state of the art by producing a comprehensive and integrative method to enhance HDP with prior information. The main areas of proposed improvement are the preprocessing of gene expression data for topic models and the introduction of informed priors to the HDP model. The results of experiments showed that prior improved HDP successfully reveals the hidden biclusters in gene expression data with higher robustness to changes in sparsity levels (number of samples) and prior strengths (η).


Mathematical Modeling and Approximation of Gene Expression Patterns
Yılmaz, Fatih; Öktem, Hüseyin Avni (2004-09-03)
This study concerns modeling, approximation and inference of gene regulatory dynamics on the basis of gene expression patterns. The dynamical behavior of gene expressions is represented by a system of ordinary differential equations. We introduce a gene-interaction matrix with some nonlinear entries, in particular, quadratic polynomials of the expression levels to keep the system solvable. The model parameters are determined by using optimization. Then, we provide the time-discrete approximation of our time...
Induction and control of large-scale gene regulatory networks
Tan, Mehmet; Tan, Mehmet; Department of Computer Engineering (2009)
Gene regulatory networks model the interactions within the cell and thus it is essential to understand their structure and to develop some control mechanisms that could effectively deal with them. This dissertation tackles these two aspects. To handle the first problem, a new constraint-based modeling algorithm is proposed that can both increase the quality of the output and decrease the computational requirements for learning the structure of gene regulatory networks by integrating multiple biological data...
Limitations of three-phase Buckley-Leverett theory
Akın, Serhat (Informa UK Limited, 2005-07-01)
The broad objective of this study is to determine the limitations of the three-phase relative permeability estimation technique by using the experimental approach. In order to achieve this goal, three-phase relative permeability experiments were conducted on a Berea sandstone core plug by using brine, hexane and nitrogen gas. An unsteady-state analytical technique and a numerical technique where a black oil simulator was coupled with a global optimization algorithm in a least squares manner was used to comp...
Short Time Series Microarray Data Analysis and Biological Annotation
Sökmen, Zerrin; Atalay, Mehmet Volkan; Atalay, Rengül (2008-01-01)
Significant gene list is the result of microarray data analysis should be explained for the purpose of biological functions. The aim of this study is to extract the biologically related gene clusters over the short time series microarray gene data by applying unsupervised methods and automatically perform biological annotation of those clusters. In the first step of the study, short time series microarray expression data is clustered according to similar expression profiles. After that, several biological d...
Optimal multiple hypothesis testing with an application in side lobe blanker design and invariance applications in detection and synchronization
Coşkun, Osman; Candan, Çağatay; Department of Electrical and Electronics Engineering (2017)
This thesis aims to study two problems, namely optimal hypothesis testing in the sense of Neyman-Pearson in the presence of multiple hypotheses and optimal hypothesis testing in the presence of non-random unknown parameters (nuisance parameters). Both problems occur frequently in different applications and their optimal solution involves some fine details. In the first part of the thesis, the multiple hypothesis testing problem is examined and the results are applied on the problem of radar sidelobe blanker...
Citation Formats
B. Tercan, “Gene function inference from expression using probabilistic topic models,” Ph.D. - Doctoral Program, Middle East Technical University, 2016.