Genome-wide sequence analysis of human splice acceptor regions for motif discovery

Karaduman Bahçe, Gülşah
For eukaryotic cells, alternative splicing of genes is a vital mechanism that drives protein diversity. Splicing signals on the genomic sequence controls the regulatory factors that orchestrate the alternative splicing. 3’ and 5’ splice sites and common branchpoint sequences are the primary splicing signals, and changes in these signals can be disease- causing. Nevertheless, an extensive genome-wide analysis of the sequences around these signals is lacking. In this study, we focused on the genome-wide motif analysis of the splice acceptor region. We analyzed 400 nucleotides long sequences (300 nucleotides upstream and 100 nucleotides downstream of 3’) to identify motifs with potential functional roles. 207,583 sequences are retrieved from Ensembl Biomart and analyzed with MEME ChIP, resulting in 517 significant splice acceptor region motifs. We identified 457 known motifs and 60 novel motifs. Among the known motifs, 227 mapped to non- human mammalian genomes. Furthermore, proteins binding to the known motifs are mainly annotated for homeoboxes, homeodomains, DNA binding regions, and transcription regulation functions. 17 of the novel motifs comply with RBP binding motifs, and 10 of the novel motifs are computationally identified and supported with experimental evidence from branchpoint studies. Moreover, the acceptor region splice altering or disease-causing variants with experimental evidence are detected to co-locate with novel motifs and known motifs of homo sapiens and other mammalians. Here, we present these novel acceptor region motifs identified for the first time as splice acceptor motif candidates. Furthermore, we provide a set of 76 known mus musculus motifs that are novel to the human genome and highly co-locate with splice-altering SNPs. Experimental validation of the biological roles of novel motifs of this study with further functional studies will increase our understanding of the splicing mechanisms.


Molecular cloning, characterization, and expression analysis of a gene encoding a Ran binding protein (RanBP) in Cucumis melo L.
Baloglu, Mehmet Cengiz; Zakharov, Florence Negre; Öktem, Hüseyin Avni; Yücel, Ayşe Meral (2011-01-01)
Ran binding proteins (RanBPs) are highly conserved members of the GTP-binding protein family that are involved in nuclear protein export between the nucleus and the cytoplasm. In this study, a CmRanBP gene from a melon was isolated (Cucumis melo L.) using the RACE (rapid amplification of cDNA ends) method. The 778 basepair long melon, with a RanBP cDNA encoding consisting of 197 amino acids (22.2 kDa protein), was characterized (GenBank accession no: EU853459). The predicted amino acid sequence of CmRanBP w...
Inference of Gene Regulatory Networks Via Multiple Data Sources and a Recommendation Method
Ozsoy, Makbule Gulcin; Polat, Faruk; Alhajj, Reda (2015-11-12)
Gene regulatory networks (GRNs) are composed of biological components, including genes, proteins and metabolites, and their interactions. In general, computational methods are used to infer the connections among these components. However, computational methods should take into account the general features of the GRNs, which are sparseness, scale-free topology, modularity and structure of the inferred networks. In this work, observing the common aspects between recommendation systems and GRNs, we decided to ...
Enzyme prediction with word embedding approach
Akın, Erkan; Atalay, M. Volkan.; Department of Computer Engineering (2019)
Information such as molecular function, biological process, and cellular localization can be inferred from the protein sequence. However, protein sequences vary in length. Therefore, the sequence itself cannot be used directly as a feature vector for pattern recognition and machine learning algorithms since these algorithms require fixed length feature vectors. We describe an approach based on the use of the Word2vec model, more specifically continuous skip-gram model to generate the vector representation o...
MicroRNAs show a wide diversity of expression profiles in the developing and mature central nervous system
Erson Bensan, Ayşe Elif (2008-10-01)
Since the discovery of microRNAs (miRNAs) in Caenorhabditis elegans, mounting evidence illustrates the important regulatory roles for miRNAs in various developmental, differentiation, cell proliferation, and apoptosis pathways of diverse organisms. We are just beginning to elucidate novel aspects of RNA mediated gene regulation and to understand how heavily various molecular pathways rely on miRNAs for their normal function. miRNAs are small non-protein-coding transcripts that regulate gene expression post-...
Intergenic and Repeat Transcription in Human, Chimpanzee and Macaque Brains Measured by RNA-Seq
Xu, Augix Guohua; He, Liu; Li, Zhongshan; Xu, Ying; Li, Mingfeng; Fu, Xing; Yan, Zheng; Yuan, Yuan; Menzel, Corinna; Li, Na; Somel, Mehmet; Hu, Hao; Chen, Wei; Paabo, Svante; Khaitovich, Philipp (Public Library of Science (PLoS), 2010-07-01)
Transcription is the first step connecting genetic information with an organism's phenotype. While expression of annotated genes in the human brain has been characterized extensively, our knowledge about the scope and the conservation of transcripts located outside of the known genes' boundaries is limited. Here, we use high-throughput transcriptome sequencing (RNA-Seq) to characterize the total non-ribosomal transcriptome of human, chimpanzee, and rhesus macaque brain. In all species, only 20-28% of non-ri...
Citation Formats
G. Karaduman Bahçe, “Genome-wide sequence analysis of human splice acceptor regions for motif discovery,” Ph.D. - Doctoral Program, Middle East Technical University, 2020.