JOA: Joint Overlap Analysis of multiple genomic interval sets

2019-03-08
Otlu, Burcak
Can, Tolga
BackgroundNext-generation sequencing (NGS) technologies have produced large volumes of genomic data. One common operation on heterogeneous genomic data is genomic interval intersection. Most of the existing tools impose restrictions such as not allowing nested intervals or requiring intervals to be sorted when finding overlaps in two or more interval sets.ResultsWe proposed segment tree (ST) and indexed segment tree forest (ISTF) based solutions for intersection of multiple genomic interval sets in parallel. We developed these methods as a tool, Joint Overlap Analysis (JOA), which takes n interval sets and finds overlapping intervals with no constraints on the given intervals. The proposed indexed segment tree forest is a novel composite data structure, which leverages on indexing and natural binning of a segment tree. We also presented construction and search algorithms for this novel data structure. We compared JOA ST and JOA ISTF with each other, and with other interval intersection tools for verification of its correctness and for showing that it attains comparable execution times.ConclusionsWe implemented JOA in Java using the fork/join framework which speeds up parallel processing by taking advantage of all available processor cores. We compared JOA ST with JOA ISTF and showed that segment tree and indexed segment tree forest methods are comparable with each other in terms of execution time and memory usage. We also carried out execution time comparison analysis for JOA and other tools and demonstrated that JOA has comparable execution time and is able to further reduce its running time by using more processors per node. JOA can be run using its GUI or as a command line tool. JOA is available with source code at https://github.com/burcakotlu/JOA/. A user manual is provided at https://joa.readthedocs.org
BMC BIOINFORMATICS

Suggestions

Discovering functional interaction patterns in protein-protein interaction networks
Turanalp, Mehmet E.; Can, Tolga (Springer Science and Business Media LLC, 2008-06-11)
Background: In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which represents physical interactions between pairs of proteins of an organism. Major research on PPI networks has focused on understanding the topological organization of PPI networks, evolution...
GLANET: genomic loci annotation and enrichment tool
Otlu, Burcak; Firtina, Can; Keles, Sunduz; Tastan, Oznur (Oxford University Press (OUP), 2017-09-15)
Motivation: Genomic studies identify genomic loci representing genetic variations, transcription factor (TF) occupancy, or histone modification through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and epigenomic annotations.
MicroarrayDesigner: an online search tool and repository for near-optimal microarray experimental designs
Sacan, Ahmet; Ferhatosmanoglu, Nilgun; Ferhatosmanoglu, Hakan (Springer Science and Business Media LLC, 2009-9-22)
Background: Dual-channel microarray experiments are commonly employed for inference of differential gene expressions across varying organisms and experimental conditions. The design of dual-channel microarray experiments that can help minimize the errors in the resulting inferences has recently received increasing attention. However, a general and scalable search tool and a corresponding database of optimal designs were still missing. Description: An efficient and scalable search method for finding nea...
ImaGene: a convolutional neural network to quantify natural selection from genomic data
Torada, Luis; Lorenzon, Lucrezia; Beddis, Alice; Isildak, Ulas; Pattini, Linda; Mathieson, Sara; Fumagalli, Matteo (Springer Science and Business Media LLC, 2019-11-22)
Background: The genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms...
Development and application of a modified dynamic time warping algorithm (DTW-S) to analyses of primate brain expression time series
Yuan, Yuan; Chen, Yi-Ping Phoebe; Ni, Shengyu; Xu, Augix Guohua; Tang, Lin; Vingron, Martin; Somel, Mehmet; Khaitovich, Philipp (Springer Science and Business Media LLC, 2011-08-18)
Background: Comparing biological time series data across different conditions, or different specimens, is a common but still challenging task. Algorithms aligning two time series represent a valuable tool for such comparisons. While many powerful computation tools for time series alignment have been developed, they do not provide significance estimates for time shift measurements.
Citation Formats
B. Otlu and T. Can, “JOA: Joint Overlap Analysis of multiple genomic interval sets,” BMC BIOINFORMATICS, pp. 0–0, 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/37230.