Predicting first-degree relationships from ancient samples using deep neural networks

Download
2023-8-25
Güler, Merve Nur
Estimating genetic relatedness between individuals using genomic data from ancient samples is of utmost importance; nonetheless, almost all current tools only distinguish between first, second, and third-degree relationship categories. The ability to distinguish between these two first-degree relationship categories is vital for investigating long-gone cultural practices. This study aims to differentiate between parent-offspring and sibling pairs using a Convolutional Neural Network (CNN) model in low-coverage ancient genomes. This study began by simulating founders using the population genetic simulator msprime and the pedigree simulator PedSim to create sibling and parent-offspring pairs under realistic demographic scenarios. Then, ancient DNA simulation was applied to obtain NGS (Next Generation Sequencing) reads similar to ancient genome reads by using Gargammel software. Next, using the mismatch rate, the coefficient of relatedness (r) was estimated across genomic windows containing 200 SNPs, i.e., the probability that two alleles at a given locus are identical by descent. Two-dimensional binning was applied on r values, and a CNN model was trained using the resulting fixed-length vectors for each pair. The model was tested under scenarios of different numbers of shared SNPs between parent-offspring, sibling, and unrelated pairs and achieved 1, 0.98, 0.89, 0.86, and 0.62 macro-average F1 scores for pairs sharing 50,000, 20,000, 10,000, 5,000, and 1,000 SNPs, respectively. This study demonstrates the potential for applying deep artificial neural network models to differentiate between first-degree relationships in low-coverage ancient genomes precisely and provides a foundation for future research in this field.
Citation Formats
M. N. Güler, “Predicting first-degree relationships from ancient samples using deep neural networks,” M.S. - Master of Science, Middle East Technical University, 2023.