Efficient coding schemes that prevent tandem repeats and achieve balance for reliable DNA data storage

2025-8-14
Demirci, Özge Simay
DNA data storage promises orders of magnitude improvements in both density and durability compared with the available data storage technologies. The write-read DNA data storage procedure consists of three stages: DNA synthesis to write data on the created strands, storing these strands in a container, and DNA sequencing to read data. These three stages suffer from various sources of error, necessitating additional data processing to improve reliability. In this work, we focus on efficient constrained coding schemes for reliable DNA data storage as a form of this additional data processing. Tandem repeats as well as GC-content imbalance are shown in the literature to cause instability in the DNA storage system, resulting in write-read errors. We introduce low-redundancy, simple constrained coding schemes to prevent error-prone tandem repeats from being written and achieve GC-content balance. The proposed schemes have potential to notably reduce the error rate in DNA data storage systems.
Citation Formats
Ö. S. Demirci, “Efficient coding schemes that prevent tandem repeats and achieve balance for reliable DNA data storage,” M.S. - Master of Science, Middle East Technical University, 2025.