Evaluation of Effects of Different Base Calling Models on Single Nucleotide Variant Calling Using Lowcoverage Long Read Sequencing

Karakurt, Hamza Umut
Pekcan, Hasan Ali
Kahraman, Ayşe
Çınar, Esra
Akgün, Bilçağ
Long-read sequencing technologies such as Oxford Nanopore Technologies (ONT) enabled researchers to sequence long reads fast and cost effectively. ONT sequencing uses nanopores integrated to semiconductor surfaces and sequences the genomic materials using changes in voltage across the surface as each nucleotide passes through nanopore. The default output of ONT sequencers are in FAST5 format. First and one of the most important steps of ONT data analysis is the conversion of FAST5 files to FASTQ files using “basecaller” tools. Generally basecaller tools use pre-trained deep learning models to transform electrical signals to reads. Guppy, the most commonly used basecaller, uses 2 main model types, fast and high accuracy. Since the computation duration is significantly different between these two models, the effect of models on variant calling process has not been fully understood. The aim of this study is to evaluate the effect of different models on performance on single nucleotide variant calling. Here, we used 8 low-coverage long read sequencing results of NA12878 (gold standard data) to compare the variant calling results of Guppy. Each data is basecalled using Guppy fast and high accuracy models and output FASTQ files aligned with minimap2 using human genome (hg19). Variants are called used Clair3 and final VCF files compared using R programming environment. Genome In A Bottle (GIAB) NA12878 high confidence variants file used for true positive variants. Obtained results indicated that pass/fail ratios of base called datasets and computation times are significantly higher in high accuracy models. Also, the number of called variants is remarkably higher in fast models but the true positive variant ratio difference is significantly smaller. The primary observation in our case using fast models does not decrease the ratio of true positive rate but decrease the number of called variants.


Simulation of turbulence induced sound generation inside stenosed femoral artery models with different severities and eccentricities
Ozden, Kamil; Yazıcıoğlu, Yiğit; Sert, Cüneyt (2021-09-01)
Background and objectives: Recent developments of low-cost, compact acoustic sensors, advanced signal processing tools and powerful computational resources allow researchers design new scoring systems for acoustic detection of arterial stenoses. In this study, numerical simulations of blood flow inside stenosed arteries are performed to understand the effect of stenosis severity and eccentricity on the turbulence induced wall pressure fluctuations and the generated sound. Methods: Axisymmetric and eccentric...
Othman, Ahmad; Fahrioğlu, Murat; Yemişcioğlu, Gürtaç; Electrical and Electronics Engineering (2022-8)
Recent advances in process automation, wireless sensor networks, and machine-to-machine (M2M) interfaces have caused embedded systems to be a blooming computing segment, with significant research focus on performance and energy efficiency. The embedded systems market witnessed enormous growth over the past decades and is foreknown to be boosted in the upcoming years. It has become harder to scale CMOS technologies compared to past and get performance and energy benefits through technology and circuits. Ther...
Evaluating the convergence of high-performance computing with big data, artificial intelligence and cloud computing technologies
Dildar Korkmaz, Yeşim; Eren, Pekin Erhan; Kayabay, Kerem; Department of Information Systems (2023-1-24)
The advancements in High-Performance Computing (HPC), Big Data, Artificial Intelligence (AI), and Cloud Computing technologies have led to a convergence of these fields, resulting in the emergence of significant improvements for a wide range of fields. Identifying the state of development of technology convergence and forecasting promising technology convergence is critical for both academia and industry. That's why technology assessment and forecasting for HPC-Big Data-AI-Cloud Computing convergence is nee...
Development of an oligonucleotide based sandwich array platform for the detection of transgenic elements from plant sources using labal-free PCR products
Gül, Fatma; Öktem, Hüseyin Avni; Eyidoğan, Füsun İnci; Department of Biotechnology (2010)
Advances in DNA micro and macroarray technologies made these high-throughput systems good candidates for the development of cheaper, faster and easier qualitative and quantitative detection methods. In this study, a simple and cost effective sandwich hybridization-based method has been developed for the rapid and sensitive detection of various unmodified recombinant elements in transgenic plants. Attention was first focused on the optimization of conditions such as time, concentration and temperature using ...
Analysis of industry 4.0 technologies’ adoption using interpretive structural modelling: empirical findings from manufacturing sector in Turkey
Öztürk, Ömer; Özkan Yıldırım, Sevgi; Department of Information Systems (2023-1)
Emerging disruptive technologies, especially big data, the internet of things (IoT), cloud, cyber-physical systems, and 3D printing technologies, led to the emergence of a new industrial era called industry 4.0. The concept of industry 4.0, which emerged at the technology fair held in Germany in 2011, has established its foundations on increasing productivity in the industry and the digitalization of systems. Although industry 4.0 technologies have various benefits for the manufacturing sector, various diff...
Citation Formats
H. U. Karakurt, H. A. Pekcan, A. Kahraman, E. Çınar, and B. Akgün, “Evaluation of Effects of Different Base Calling Models on Single Nucleotide Variant Calling Using Lowcoverage Long Read Sequencing,” Erdemli, Mersin, TÜRKİYE, 2022, p. 3021, Accessed: 00, 2023. [Online]. Available: https://hibit2022.ims.metu.edu.tr/.