Genome-wide sequence analysis of human splice acceptor regions for motif discovery

Karaduman Bahçe, Gülşah
For eukaryotic cells, alternative splicing of genes is a vital mechanism that drives protein diversity. Splicing signals on the genomic sequence controls the regulatory factors that orchestrate the alternative splicing. 3’ and 5’ splice sites and common branchpoint sequences are the primary splicing signals, and changes in these signals can be disease- causing. Nevertheless, an extensive genome-wide analysis of the sequences around these signals is lacking. In this study, we focused on the genome-wide motif analysis of the splice acceptor region. We analyzed 400 nucleotides long sequences (300 nucleotides upstream and 100 nucleotides downstream of 3’) to identify motifs with potential functional roles. 207,583 sequences are retrieved from Ensembl Biomart and analyzed with MEME ChIP, resulting in 517 significant splice acceptor region motifs. We identified 457 known motifs and 60 novel motifs. Among the known motifs, 227 mapped to non- human mammalian genomes. Furthermore, proteins binding to the known motifs are mainly annotated for homeoboxes, homeodomains, DNA binding regions, and transcription regulation functions. 17 of the novel motifs comply with RBP binding motifs, and 10 of the novel motifs are computationally identified and supported with experimental evidence from branchpoint studies. Moreover, the acceptor region splice altering or disease-causing variants with experimental evidence are detected to co-locate with novel motifs and known motifs of homo sapiens and other mammalians. Here, we present these novel acceptor region motifs identified for the first time as splice acceptor motif candidates. Furthermore, we provide a set of 76 known mus musculus motifs that are novel to the human genome and highly co-locate with splice-altering SNPs. Experimental validation of the biological roles of novel motifs of this study with further functional studies will increase our understanding of the splicing mechanisms.
Citation Formats
G. Karaduman Bahçe, “Genome-wide sequence analysis of human splice acceptor regions for motif discovery,” Ph.D. - Doctoral Program, Middle East Technical University, 2020.