Gene function inference from expression using probabilistic topic models

Download
2016
Tercan, Bahar
The main aim of this study is to develop a probabilistic biclustering approach which can help to elaborate on the question "Can we determine the biological context of a sample (tissue/condition etc.) using expression data and associate the contexts with annotation databases like Gene Ontology, KEGG and HUGE to discover annotations (like cell division, metabolic process, illness etc.) for these contexts?". We applied a nonparametric probabilistic topic model, Hierarchical Dirichlet Process (HDP), which was originally developed for text mining to extract unknown number of latent topics from documents, to gene expression data analysis. In this study, the analogy is the mRNA transcript to the word, the biological context to the topic and the sample to the document. This study builds on previous studies that have, to varying extents, been able to apply topic models to the problem of differential expression, and improves on the current state of the art by producing a comprehensive and integrative method to enhance HDP with prior information. The main areas of proposed improvement are the preprocessing of gene expression data for topic models and the introduction of informed priors to the HDP model. The results of experiments showed that prior improved HDP successfully reveals the hidden biclusters in gene expression data with higher robustness to changes in sparsity levels (number of samples) and prior strengths (η).

Suggestions

Limitations of three-phase Buckley-Leverett theory
Akın, Serhat (Informa UK Limited, 2005-07-01)
The broad objective of this study is to determine the limitations of the three-phase relative permeability estimation technique by using the experimental approach. In order to achieve this goal, three-phase relative permeability experiments were conducted on a Berea sandstone core plug by using brine, hexane and nitrogen gas. An unsteady-state analytical technique and a numerical technique where a black oil simulator was coupled with a global optimization algorithm in a least squares manner was used to comp...
Short Time Series Microarray Data Analysis and Biological Annotation
Sökmen, Zerrin; Atalay, Mehmet Volkan; Atalay, Rengül (2008-01-01)
Significant gene list is the result of microarray data analysis should be explained for the purpose of biological functions. The aim of this study is to extract the biologically related gene clusters over the short time series microarray gene data by applying unsupervised methods and automatically perform biological annotation of those clusters. In the first step of the study, short time series microarray expression data is clustered according to similar expression profiles. After that, several biological d...
Mathematical Modeling and Approximation of Gene Expression Patterns
Yılmaz, Fatih; Öktem, Hüseyin Avni (2004-09-03)
This study concerns modeling, approximation and inference of gene regulatory dynamics on the basis of gene expression patterns. The dynamical behavior of gene expressions is represented by a system of ordinary differential equations. We introduce a gene-interaction matrix with some nonlinear entries, in particular, quadratic polynomials of the expression levels to keep the system solvable. The model parameters are determined by using optimization. Then, we provide the time-discrete approximation of our time...
Optimal multiple hypothesis testing with an application in side lobe blanker design and invariance applications in detection and synchronization
Coşkun, Osman; Candan, Çağatay; Department of Electrical and Electronics Engineering (2017)
This thesis aims to study two problems, namely optimal hypothesis testing in the sense of Neyman-Pearson in the presence of multiple hypotheses and optimal hypothesis testing in the presence of non-random unknown parameters (nuisance parameters). Both problems occur frequently in different applications and their optimal solution involves some fine details. In the first part of the thesis, the multiple hypothesis testing problem is examined and the results are applied on the problem of radar sidelobe blanker...
Gene Level Concurrency in Genetic Algorithms
Şehitoğlu, Onur Tolga; Üçoluk, Göktürk (Springer-Verlag, 2007-01-01)
This study describes an alternative concurrency approach in genetic algorithms. Inspiring from implicit parallelism in a physical chromosome, a vertical concurrency is introduced. Proposed gene process model allows genetic algorithms work in encodings independent from the gene position ordering in a chromosome. This feature is used to implement a gene reordering version of genetic algorithm. Further possible models of flexible gene position encodings are discussed.
Citation Formats
B. Tercan, “Gene function inference from expression using probabilistic topic models,” Ph.D. - Doctoral Program, Middle East Technical University, 2016.