Gene function inference from expression using probabilistic topic models

Download
2016
Tercan, Bahar
The main aim of this study is to develop a probabilistic biclustering approach which can help to elaborate on the question "Can we determine the biological context of a sample (tissue/condition etc.) using expression data and associate the contexts with annotation databases like Gene Ontology, KEGG and HUGE to discover annotations (like cell division, metabolic process, illness etc.) for these contexts?". We applied a nonparametric probabilistic topic model, Hierarchical Dirichlet Process (HDP), which was originally developed for text mining to extract unknown number of latent topics from documents, to gene expression data analysis. In this study, the analogy is the mRNA transcript to the word, the biological context to the topic and the sample to the document. This study builds on previous studies that have, to varying extents, been able to apply topic models to the problem of differential expression, and improves on the current state of the art by producing a comprehensive and integrative method to enhance HDP with prior information. The main areas of proposed improvement are the preprocessing of gene expression data for topic models and the introduction of informed priors to the HDP model. The results of experiments showed that prior improved HDP successfully reveals the hidden biclusters in gene expression data with higher robustness to changes in sparsity levels (number of samples) and prior strengths (η).

Suggestions

Mathematical Modeling and Approximation of Gene Expression Patterns
Yılmaz, Fatih; Öktem, Hüseyin Avni (2004-09-03)
This study concerns modeling, approximation and inference of gene regulatory dynamics on the basis of gene expression patterns. The dynamical behavior of gene expressions is represented by a system of ordinary differential equations. We introduce a gene-interaction matrix with some nonlinear entries, in particular, quadratic polynomials of the expression levels to keep the system solvable. The model parameters are determined by using optimization. Then, we provide the time-discrete approximation of our time...
Limitations of three-phase Buckley-Leverett theory
Akın, Serhat (Informa UK Limited, 2005-07-01)
The broad objective of this study is to determine the limitations of the three-phase relative permeability estimation technique by using the experimental approach. In order to achieve this goal, three-phase relative permeability experiments were conducted on a Berea sandstone core plug by using brine, hexane and nitrogen gas. An unsteady-state analytical technique and a numerical technique where a black oil simulator was coupled with a global optimization algorithm in a least squares manner was used to comp...
Short Time Series Microarray Data Analysis and Biological Annotation
Sökmen, Zerrin; Atalay, Mehmet Volkan; Atalay, Rengül (2008-01-01)
Significant gene list is the result of microarray data analysis should be explained for the purpose of biological functions. The aim of this study is to extract the biologically related gene clusters over the short time series microarray gene data by applying unsupervised methods and automatically perform biological annotation of those clusters. In the first step of the study, short time series microarray expression data is clustered according to similar expression profiles. After that, several biological d...
Identification of Novel Reference Genes Based on MeSH Categories
Ersahin, Tulin; ÇARKACIOĞLU, Levent; Can, Tolga; Konu, Ozlen; Atalay, Mehmet Volkan; Atalay, Rengül (2014-03-28)
Transcriptome experiments are performed to assess protein abundance through mRNA expression analysis. Expression levels of genes vary depending on the experimental conditions and the cell response. Transcriptome data must be diverse and yet comparable in reference to stably expressed genes, even if they are generated from different experiments on the same biological context from various laboratories. In this study, expression patterns of 9090 microarray samples grouped into 381 NCBI-GEO datasets were invest...
Induction and control of large-scale gene regulatory networks
Tan, Mehmet; Tan, Mehmet; Department of Computer Engineering (2009)
Gene regulatory networks model the interactions within the cell and thus it is essential to understand their structure and to develop some control mechanisms that could effectively deal with them. This dissertation tackles these two aspects. To handle the first problem, a new constraint-based modeling algorithm is proposed that can both increase the quality of the output and decrease the computational requirements for learning the structure of gene regulatory networks by integrating multiple biological data...
Citation Formats
B. Tercan, “Gene function inference from expression using probabilistic topic models,” Ph.D. - Doctoral Program, Middle East Technical University, 2016.