Detection of Binding Sites of Chip-seq Data via Hidden Markov Model and Frequentist Inference of Model Parameters

2018-06-27
Doğan Dar, Elif
Purutçuoğlu Gazi, Vilda
The hidden Markov model (HMM) is one of the major modeling approaches that is based on the graphical representation inthe form of a chain. In this structure, we have a sequence of multinomial “state” nodes which are hidden and a sequence of observations that are produced by the states [1]. In order to construct the link between the states and observations, different types of inference methods are suggested in the literature. For instance, the EM algorithm or the Baum-Welch forward-backward algorithm [2] and the Viterbi algorithm [3] are two well-known approaches in this field. In this study, we apply the frequentistmethods in order to model different Chip-seq datasets. Basically, the experiments of the Chip sequence are performed to locate the DNA bindings sites which are occupied by promoters, enhancers, repressors and insulators. By using this dataset, it is possible todetect the bindings sites of genome which has the spatial dependency between neighboring sites (or windows). This information can be important, for example, to determine the point of amino acid physicochemical properties since the co-localized amino acid sequences must share some similarity, and hereby, to determine the protein tertiary structure [4]. Hence, in this work, we perform two alternative frequentistways to find the underlying binding sites of publicly available datasets. For the first approach,we propose to accept the measurements as the observed states and the core physicochemical features of the data which are obtained by the feature extraction as the hidden states in HMM. Once the associated feature vectors, which denote the peptides, are estimated, we investigate which pair of amino acid position is suitable for binding. On the other hand, for the second alternative approach, we consider to define the bindings site as a separate parameter in the estimation via the constructed HMM above. Finally, we compare the accuracy of both results by validating the findings with known literature. We think that such frequentist inference may open new avenues in the determination of protein structures and better understand certain diseases which are dependent on the binding affinity.
International Conference on Applied Mathematics in Engineering (2018)

Suggestions

Comparison of two inference approaches in Gaussian graphical models
Purutçuoğlu Gazi, Vilda; Wit, Ernst (Walter de Gruyter GmbH, 2017-04-01)
Introduction: The Gaussian Graphical Model (GGM) is one of the well-known probabilistic models which is based on the conditional independency of nodes in the biological system. Here, we compare the estimates of the GGM parameters by the graphical lasso (glasso) method and the threshold gradient descent (TGD) algorithm.
Gaussian graphical approaches in estimation of biological systems
Ayyıldız, Ezgi; Purutçuoğlu Gazi, Vilda; Department of Statistics (2013)
The Gaussian Graphical Model (GGM) is one of the well-known deterministic inference methods which is based on the conditional independency of nodes in the system. In this study we consider to implement this approach in small and relatively large networks under different singularity and sparsity conditions. In inference of these systems we perform lasso and L-1 penalized lasso regression approaches and select the best fitted model to the data by using different criteria. Among many alternatives, we apply the ...
3D object representation using transform and scale invariant 3D features
AKAGÜNDÜZ, Erdem; Ulusoy, İlkay (2007-10-21)
An algorithm is proposed for 3D object representation using generic 3D features which are transformation and scale invariant. Descriptive 3D features and their relations are used to construct a graphical model for the object which is later trained and then used for detection purposes. Descriptive 3D features are the fundamental structures which are extracted from the surface of the 3D scanner output. This surface is described by mean and Gaussian curvature values at every data point at various scales and a ...
Loop-based conic multivariate adaptive regression splines is a novel method for advanced construction of complex biological networks
Ayyıldız Demirci, Ezgi; Purutçuoğlu Gazi, Vilda; Weber, Gerhard Wilhelm (2018-11-01)
The Gaussian Graphical Model (GGM) and its Bayesian alternative, called, the Gaussian copula graphical model (GCGM) are two widely used approaches to construct the undirected networks of biological systems. They define the interactions between species by using the conditional dependencies of the multivariate normality assumption. However, when the system's dimension is high, the performance of the model becomes computationally demanding, and, particularly, the accuracy of GGM decreases when the observations...
A probabilistic sparse skeleton based object detection
Altinoklu, Burak; Ulusoy, İlkay; Tarı, Zehra Sibel (Elsevier BV, 2016-11)
We present a Markov Random Field (MRF) based skeleton model for object shape and employ it in a probabilistic chamfer-matching framework for shape based object detection. Given an object category, shape hypotheses are generated from a set of sparse (coarse) skeletons guided by suitably defined unary and binary potentials at and between shape parts. The Markov framework assures that the generated samples properly reflect the observed or desired shape variability. As the model employs a sparsely sampled skele...
Citation Formats
E. Doğan Dar and V. Purutçuoğlu Gazi, “Detection of Binding Sites of Chip-seq Data via Hidden Markov Model and Frequentist Inference of Model Parameters,” Balıkesir, Türkiye, 2018, p. 166, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/71075.