SCITUNA: A Network Alignment Approach for Integrating Multiple Single-Cell RNA-Seq Datasets

2022-10
Dogan , Onur
Erten, Burak Onur
Erten , Cesim
Houdjedj , Aissa
Kazan , Hilal
Krichen , Mohamed
Marouf , Yacine
The throughput and cost of single-cell RNA sequencing (scRNA-seq) are in continuous improvement, and so is the demand for larger-scale scRNA-seq data, which could require integrating multiple datasets from different sequencing experiments. The integration of different scRNA-seq datasets could be challenging due to batch effect, a phenomenon that could occur when the experiments are run in different laboratories, at different time periods, or when using different instruments and technologies. Batch effect correction is a necessary process to prevent misleading results in downstream analysis on the integrated data. The challenge in scRNA-seq integration is mainly to merge the datasets while keeping the cell populations separate and maintaining the local structure of the datasets. We introduce SciTuna, a Single-Cell RNA-seq datasets Integration Tool Using Network Alignment with batch effect correction. Our method finds matching cells between the batches and uses an iterative approach to refine the integration of each cell based on the nearest neighboring cells. We show that our method outperforms other integration methods such as Seurat, Batman, and scAlign using simulated, semi-real, and real data based on different metrics. SciTuna also shows a reliable performance integrating datasets with semioverlapping population compositions. Lastly, comparative differential expression analysis was carried out on the integrated datasets to demonstrate the batch effect correction and the robustness of the integration method.

Suggestions

JOA: Joint Overlap Analysis of multiple genomic interval sets
Otlu, Burcak; Can, Tolga (Springer Science and Business Media LLC, 2019-03-08)
BackgroundNext-generation sequencing (NGS) technologies have produced large volumes of genomic data. One common operation on heterogeneous genomic data is genomic interval intersection. Most of the existing tools impose restrictions such as not allowing nested intervals or requiring intervals to be sorted when finding overlaps in two or more interval sets.ResultsWe proposed segment tree (ST) and indexed segment tree forest (ISTF) based solutions for intersection of multiple genomic interval sets in parallel...
ProFAB-open protein functional annotation benchmark
Ozdilek, A. Samet; Atakan, Ahmet; Ozsari, Goekhan; Acar, Aybar; Atalay, Mustafa Ümit; DOĞAN, TUNCA; RİFAİOĞLU, AHMET SÜREYYA (2023-02-01)
As the number of protein sequences increases in biological databases, computational methods are required to provide accurate functional annotation with high coverage. Although several machine learning methods have been proposed for this purpose, there are still two main issues: (i) construction of reliable positive and negative training and validation datasets, and (ii) fair evaluation of their performances based on predefined experimental settings. To address these issues, we have developed ProFAB: Open Pr...
Deep Learning-Enabled Technologies for Bioimage Analysis
Rabbi, Fazle; Dabbagh, Sajjad Rahmani; Angın, Pelin; Yetisen, Ali Kemal; Tasoglu, Savas (2022-02-01)
Deep learning (DL) is a subfield of machine learning (ML), which has recently demon-strated its potency to significantly improve the quantification and classification workflows in bio-medical and clinical applications. Among the end applications profoundly benefitting from DL, cellular morphology quantification is one of the pioneers. Here, we first briefly explain fundamental concepts in DL and then we review some of the emerging DL-enabled applications in cell morphology quantification in the fields of em...
CAP-RNAseq: An online tool for Clustering, Annotation and Prioritization of RNAseq data
Vural Özdeniz , Merve; Çalışır , Kübra; Arıcı , Burçin Irem; Acar , Rana; Targen , Seniye; Dalgıç , Ertuğrul; Konu, Özlen (Orta Doğu Teknik Üniversitesi Enformatik Enstitüsü; 2022-10)
Transcriptome analysis has been an effective high throughput method for examining the regulation and function of genomes. However, analysis of RNAseq experiments with more than two groups/factors is complex and can benefit from clustering followed by annotation and prioritization of genes. While a few advanced analysis pipelines are available for such data sets, none of them make the gene selection process automated and hence easier for validation experiments. To fill this gap, we have developed a web tool ...
Similarity search in protein sequence databases using metric access methods
Cetintas, Ahmet; Sacan, Ahmet; Toroslu, İsmail Hakkı (2013-09-13)
The rapid increase in the size of biological sequence data owing to the advancements in high-throughput sequencing techniques, and the increased complexity of hypothesis-driven exploration of this data requiring massive number of similarity queries call for new approaches for managing sequence databases and analysis of this information. The metric space representation for sequences is suitable for similarity search and provides several sophisticated metric-indexing techniques. In this work, we provide a tho...
Citation Formats
O. Dogan et al., “SCITUNA: A Network Alignment Approach for Integrating Multiple Single-Cell RNA-Seq Datasets,” Erdemli, Mersin, TÜRKİYE, 2022, p. 3121, Accessed: 00, 2023. [Online]. Available: https://hibit2022.ims.metu.edu.tr/.