Comparative analysis of long-read structural variant calling tools: benchmarking tools and different combinations for detecting somatic structural variants in whole-genome sequencing data

2025-1
Aydın, Safa Kerem
Cancer genomes exhibit a diverse set of mutations, including structural variations (SVs), which are large-scale genomic rearrangements. These SVs can alter genes and regulatory regions, making a significant contribution to cancer initiation and progression. However, accurately finding somatic structural variations in cancer genomics remains a challenging task. Long-read sequencing methods have significant potential for enhancing SV identification, and several approaches are being developed to address this issue. In this thesis, eight commonly used SV detection tools were used on paired tumor and normal samples from two different sources: the NCI-H2009 lung cancer cell line and the COLO829 melanoma cell line, which served as a reference with a verified somatic SV truth set. Candidate somatic SVs were identified by performing independent variant calling on tumor and normal samples, followed by merging variant calls and applying subtraction method. Additionally, various combinations of these tools were evaluated to enhance the accuracy of somatic SV detection. Our comprehensive analysis emphasizes the accurate identification of true somatic variants validated by the truth set, providing a detailed evaluation of the performance of each tool across a wide range of variant types and counts. By comparing eight SV calling tools and their combinations, both the advantages and limitations of existing tools were discovered and a foundation for developing more reliable SV detection pipelines were reported. The findings demonstrate that integrating multiple tools and testing diverse combinations could significantly improve the validation of true somatic SVs.
Citation Formats
S. K. Aydın, “Comparative analysis of long-read structural variant calling tools: benchmarking tools and different combinations for detecting somatic structural variants in whole-genome sequencing data,” M.S. - Master of Science, Middle East Technical University, 2025.