Detection of Binding Sites of Chip-seq Data via Hidden Markov Model and Frequentist Inference of Model Parameters

Doğan Dar, Elif
Purutçuoğlu Gazi, Vilda
The hidden Markov model (HMM) is one of the major modeling approaches that is based on the graphical representation inthe form of a chain. In this structure, we have a sequence of multinomial “state” nodes which are hidden and a sequence of observations that are produced by the states [1]. In order to construct the link between the states and observations, different types of inference methods are suggested in the literature. For instance, the EM algorithm or the Baum-Welch forward-backward algorithm [2] and the Viterbi algorithm [3] are two well-known approaches in this field. In this study, we apply the frequentistmethods in order to model different Chip-seq datasets. Basically, the experiments of the Chip sequence are performed to locate the DNA bindings sites which are occupied by promoters, enhancers, repressors and insulators. By using this dataset, it is possible todetect the bindings sites of genome which has the spatial dependency between neighboring sites (or windows). This information can be important, for example, to determine the point of amino acid physicochemical properties since the co-localized amino acid sequences must share some similarity, and hereby, to determine the protein tertiary structure [4]. Hence, in this work, we perform two alternative frequentistways to find the underlying binding sites of publicly available datasets. For the first approach,we propose to accept the measurements as the observed states and the core physicochemical features of the data which are obtained by the feature extraction as the hidden states in HMM. Once the associated feature vectors, which denote the peptides, are estimated, we investigate which pair of amino acid position is suitable for binding. On the other hand, for the second alternative approach, we consider to define the bindings site as a separate parameter in the estimation via the constructed HMM above. Finally, we compare the accuracy of both results by validating the findings with known literature. We think that such frequentist inference may open new avenues in the determination of protein structures and better understand certain diseases which are dependent on the binding affinity.
Citation Formats
E. Doğan Dar and V. Purutçuoğlu Gazi, “Detection of Binding Sites of Chip-seq Data via Hidden Markov Model and Frequentist Inference of Model Parameters,” Balıkesir, Türkiye, 2018, p. 166, Accessed: 00, 2021. [Online]. Available: