Deciphering sequence variations and splicing sensitivity: predictive analysis of PSI in SRRM4 response groups

2025-9-01
Böler, Ümit Sude
Alternative splicing is essential for the expansion of transcriptomic complexity, with microexons representing some of the most functionally critical and tightly regulated splicing events in neural development. Because microexon inclusion is highly sensitive to Serine/Arginine Repetitive Matrix 4 (SRRM4) levels and is often disrupted in neurodevelopmental disorders, including autism, defining the cis-regulatory features that control inclusion is clinically relevant. Under differential SRRM4 expression, this thesis examines the effects of engineered sequence variants in the upstream intron, microexon, downstream intron on Percent Spliced In (PSI) values. Using a synthetic microexon library from a Massively Parallel Splicing Assays (MaPSy), a custom convolutional neural network (CNN) model was developed to predict splicing outcomes under four SRRM4 response conditions. The model was trained on one-hot encoded sequences and auxiliary metadata, achieving strong predictive performance across conditions. In order to improve the biological interpretability, DeepLIFT-based feature attribution and TF-MoDISco-lite were implemented to extract regulatory sequence motifs that contribute to PSI predictions. The discovered motifs were then annotated using TOMTOM against the CISBP-RNA database, providing insights into potential co-regulatory elements associated with SRRM4-mediated splicing modulation. This study offers a computational framework for deciphering the cis-regulatory logic of microexon inclusion and highlights how integrative modeling can advance our understanding of splicing regulation in neurodevelopmental contexts.
Citation Formats
Ü. S. Böler, “Deciphering sequence variations and splicing sensitivity: predictive analysis of PSI in SRRM4 response groups,” M.S. - Master of Science, Middle East Technical University, 2025.