Genome-wide sequence analysis of human splice acceptor regions for motif discovery

Download
2020-12-23
Karaduman Bahçe, Gülşah
For eukaryotic cells, alternative splicing of genes is a vital mechanism that drives protein diversity. Splicing signals on the genomic sequence controls the regulatory factors that orchestrate the alternative splicing. 3’ and 5’ splice sites and common branchpoint sequences are the primary splicing signals, and changes in these signals can be disease- causing. Nevertheless, an extensive genome-wide analysis of the sequences around these signals is lacking. In this study, we focused on the genome-wide motif analysis of the splice acceptor region. We analyzed 400 nucleotides long sequences (300 nucleotides upstream and 100 nucleotides downstream of 3’) to identify motifs with potential functional roles. 207,583 sequences are retrieved from Ensembl Biomart and analyzed with MEME ChIP, resulting in 517 significant splice acceptor region motifs. We identified 457 known motifs and 60 novel motifs. Among the known motifs, 227 mapped to non- human mammalian genomes. Furthermore, proteins binding to the known motifs are mainly annotated for homeoboxes, homeodomains, DNA binding regions, and transcription regulation functions. 17 of the novel motifs comply with RBP binding motifs, and 10 of the novel motifs are computationally identified and supported with experimental evidence from branchpoint studies. Moreover, the acceptor region splice altering or disease-causing variants with experimental evidence are detected to co-locate with novel motifs and known motifs of homo sapiens and other mammalians. Here, we present these novel acceptor region motifs identified for the first time as splice acceptor motif candidates. Furthermore, we provide a set of 76 known mus musculus motifs that are novel to the human genome and highly co-locate with splice-altering SNPs. Experimental validation of the biological roles of novel motifs of this study with further functional studies will increase our understanding of the splicing mechanisms.

Suggestions

Molecular cloning, characterization, and expression analysis of a gene encoding a Ran binding protein (RanBP) in Cucumis melo L.
Baloglu, Mehmet Cengiz; Zakharov, Florence Negre; Öktem, Hüseyin Avni; Yücel, Ayşe Meral (2011-01-01)
Ran binding proteins (RanBPs) are highly conserved members of the GTP-binding protein family that are involved in nuclear protein export between the nucleus and the cytoplasm. In this study, a CmRanBP gene from a melon was isolated (Cucumis melo L.) using the RACE (rapid amplification of cDNA ends) method. The 778 basepair long melon, with a RanBP cDNA encoding consisting of 197 amino acids (22.2 kDa protein), was characterized (GenBank accession no: EU853459). The predicted amino acid sequence of CmRanBP w...
Enzyme prediction with word embedding approach
Akın, Erkan; Atalay, M. Volkan.; Department of Computer Engineering (2019)
Information such as molecular function, biological process, and cellular localization can be inferred from the protein sequence. However, protein sequences vary in length. Therefore, the sequence itself cannot be used directly as a feature vector for pattern recognition and machine learning algorithms since these algorithms require fixed length feature vectors. We describe an approach based on the use of the Word2vec model, more specifically continuous skip-gram model to generate the vector representation o...
Inference of Gene Regulatory Networks Via Multiple Data Sources and a Recommendation Method
Ozsoy, Makbule Gulcin; Polat, Faruk; Alhajj, Reda (2015-11-12)
Gene regulatory networks (GRNs) are composed of biological components, including genes, proteins and metabolites, and their interactions. In general, computational methods are used to infer the connections among these components. However, computational methods should take into account the general features of the GRNs, which are sparseness, scale-free topology, modularity and structure of the inferred networks. In this work, observing the common aspects between recommendation systems and GRNs, we decided to ...
PageRank-flux On Graphlet-Guided-Network(PRO-GGNet): A Method for Pathway Reconstruction and Multi-Omic Data Integration
Arıcı, Kaan; Tunçbağ, Nurcan (Orta Doğu Teknik Üniversitesi Enformatik Enstitüsü; 2022-10)
The recent advancement of omic technologies provides snapshots of cells, tissues, or patients identifying prominent genes, proteins, metabolites, and small molecules. However, accumulated big data on various omic data types may inherently make diseases or perturbations incomprehensible. Network inference or reconstruction methods map a set of significantly altered proteins/genes/metabolites to a given reference network that is composed of already known relations or interactions. Followingly, the signals fro...
MicroRNAs show a wide diversity of expression profiles in the developing and mature central nervous system
Erson Bensan, Ayşe Elif (2008-10-01)
Since the discovery of microRNAs (miRNAs) in Caenorhabditis elegans, mounting evidence illustrates the important regulatory roles for miRNAs in various developmental, differentiation, cell proliferation, and apoptosis pathways of diverse organisms. We are just beginning to elucidate novel aspects of RNA mediated gene regulation and to understand how heavily various molecular pathways rely on miRNAs for their normal function. miRNAs are small non-protein-coding transcripts that regulate gene expression post-...
Citation Formats
G. Karaduman Bahçe, “Genome-wide sequence analysis of human splice acceptor regions for motif discovery,” Ph.D. - Doctoral Program, Middle East Technical University, 2020.