Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Prediction of Non-coding Driver Mutations Using Ensemble Learning
Date
2023-01-01
Author
Basharat, Sana
Huseynov, Ramal
Kilinc, Huseyin Hilmi
Otlu Sarıtaş, Burçak
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
22
views
0
downloads
Cite This
Driver coding mutations are extensively studied and frequently detected by their deleterious amino acid changes that affect protein function. However, non-coding mutations need further analysis and experimental validation to determine them as driver non-coding mutations. Here, we employ the XGBoost (eXtreme Gradient Boosting) algorithm to predict driver non-coding mutations based on novel long-range interaction features and engineered transcription factor binding site features augmented with features from existing annotation and effect prediction tools. Regarding novel long-range interaction features, we capture the frequency and spread of interacting regions overlapping with the non-coding mutation of interest. For this purpose, we use self-balancing trees to find overlaps within chromatin loop files and store the interacting regions as separate tree structures. For engineered transcription factor (TF) binding features, we train TF models utilizing the stochastic gradient descent (SGD) algorithm to predict the loss and gain of functions at transcription factor binding sites by giving more weight to the non-coding mutations affecting transcription factor binding affinities. We also include features from existing annotation and effect prediction tools; some rely on deep learning methods relating to splicing effect, number of associated protein products, variant consequences, biotypes, and others. For the known driver and non-driver non-coding mutations, the resulting aggregated dataset is trained with our gradient boosting model to predict driver versus passenger non-coding mutations. We then use non-coding driver mutations found in other state-of-the-art studies, similarly annotate them, and pass them through our model to make a comparison. Furthermore, we elaborate on the results by using explainable AI methodologies. Our results show an above-average performance on the unseen test data and suggest that using our annotations and training the resulting data using gradient boosting trees, the classification between a driver versus passenger non-coding mutation is possible with relatively high degrees of accuracy.
Subject Keywords
Boosting
,
Driver Mutations
,
Ensemble Learning
,
Explainable AI
,
Long-range Interactions
,
Non-coding Mutations
URI
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85184920385&origin=inward
https://hdl.handle.net/11511/109274
DOI
https://doi.org/10.1109/bibm58861.2023.10386056
Conference Name
2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
Collections
Graduate School of Informatics, Conference / Seminar
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
S. Basharat, R. Huseynov, H. H. Kilinc, and B. Otlu Sarıtaş, “Prediction of Non-coding Driver Mutations Using Ensemble Learning,” presented at the 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, İstanbul, Türkiye, 2023, Accessed: 00, 2024. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85184920385&origin=inward.