Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Prediction of Non-Coding Driver Mutations Using Ensemble Learning
Download
Sana Basharat Thesis.pdf
Date
2024-6
Author
Basharat, Sana
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
271
views
60
downloads
Cite This
Driver coding mutations are extensively studied and frequently detected due to their deleterious amino acid changes that affect protein function. However, driver non-coding mutations need further analysis and experimental validation for detection. Here, we employ the XGBoost (eXtreme Gradient Boosting) algorithm to predict driver non-coding mutations based on novel long-range interaction features and engineered transcription factor binding site features, augmented with features from existing annotation and effect prediction tools. We utilize a novel method involving arrays to accurately capture the frequency and distribution of long-range interacting regions of interest. We use transcription factor (TF) models trained using the stochastic gradient descent (SGD) algorithm to predict the loss and gain of functions at TF binding sites. We also include features from existing annotation and variant effect prediction tools. The resulting dataset is passed through a forward stepwise selection and feature engineering pipeline and then trained with our gradient boosting model to predict driver versus passenger non-coding mutations. We also pass our dataset through a known driver discovery model from existing literature, which is a combination of 50 gradient-boosted tree models. We then use non-coding driver mutations found in other state-of-the-art studies, similarly annotate them, and predict their driver-ness using our models in order to evaluate our models' prediction capabilities. Furthermore, we use Explainable AI methodologies to perform an in-depth analysis of the generated predictions. Our results show an above-average performance on the unseen validation data and suggest that, by using our annotations and training the resulting data using gradient boosting trees, the classification between a driver versus passenger non-coding mutation is possible with relatively high degrees of accuracy.
Subject Keywords
Driver Mutations
,
Ensemble Learning
,
Non-coding Mutations
,
Explainable AI
,
Long-range Interactions
URI
https://hdl.handle.net/11511/110124
Collections
Graduate School of Informatics, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
S. Basharat, “Prediction of Non-Coding Driver Mutations Using Ensemble Learning,” M.S. - Master of Science, Middle East Technical University, 2024.