Species Classification from Short Genomic Reads using Feedforward Neural Networks

Özzeybek, Emre
With the cost of Next Generation Sequencing technologies in decline, the need for fast and efficient classification of genomic findings has become of utmost importance. Due to the output length limitations of most Second Generation Sequencing techniques, it is important that we are able to classify short reads of DNA. In this research, we trained a basic Artificial Neural Network model with three hidden layers on short reads(50-500 bp) taken from two species' reference genomes. We selected Escherichia Coli and Saccharomyces Cerevisiae for their short and well-studied reference genomes. Their taxonomic difference makes them ideal candidates for ascertaining the viability of using the whole genome for species classification. We then classified these short reads. We achieved moderate success with a classification accuracy of 80%-91% corresponding to differing hyperparameters and read lengths. We documented the encountered issues and considered future directions.
Citation Formats
E. Özzeybek, “Species Classification from Short Genomic Reads using Feedforward Neural Networks,” M.S. - Master of Science, Middle East Technical University, 2023.