Principal microbial groups: compositional alternative to phylogenetic grouping of microbiome data

Boyraz, Asli
Pawlowsky-Glahn, Vera
Jose Egozcue, Juan
Acar, Aybar Can
Statistical and machine learning techniques based on relative abundances have been used to predict health conditions and to identify microbial biomarkers. However, high dimensionality, sparsity and the compositional nature of microbiome data represent statistical challenges. On the other hand, the taxon grouping allows summarizing microbiome abundance with a coarser resolution in a lower dimension, but it presents new challenges when correlating taxa with a disease. In this work, we present a novel approach that groups Operational Taxonomical Units (OTUs) based only on relative abundances as an alternative to taxon grouping. The proposed procedure acknowledges the compositional data making use of principal balances. The identified groups are called Principal Microbial Groups (PMGs). The procedure reduces the need for user-defined aggregation of (OTUs) and offers the possibility of working with coarse group of OTUs, which are not present in a phylogenetic tree. PMGs can be used for two different goals: (1) as a dimensionality reduction method for compositional data, (2) as an aggregation procedure that provides an alternative to taxon grouping for construction of microbial balances afterward used for disease prediction. We illustrate the procedure with a cirrhosis study data. PMGs provide a coherent data analysis for the search of biomarkers in human microbiota. The source code and demo data for PMGs are available at:


Microbiome Data Analysis Using Compositional Data Approach
Boyraz, Aslı; Acar, Aybar Can; Nalbantoğlu, Özkan Ufuk; Department of Bioinformatics (2022-11-18)
The microorganisms present in the human body play a crucial role in maintaining human health, and the environmental microbiome influences the human microbiome. Advanced understanding of the human microbiome and indoor microbiota is the first step towards understanding the potential relationships between health and microbiome. Next Generation Sequencing (NGS) enables identification and study of a large number of microorganisms in a short time. With the identification of a large number of microorganisms, the ...
Short Time Series Microarray Data Analysis and Biological Annotation
Sökmen, Zerrin; Atalay, Mehmet Volkan; Atalay, Rengül (2008-01-01)
Significant gene list is the result of microarray data analysis should be explained for the purpose of biological functions. The aim of this study is to extract the biologically related gene clusters over the short time series microarray gene data by applying unsupervised methods and automatically perform biological annotation of those clusters. In the first step of the study, short time series microarray expression data is clustered according to similar expression profiles. After that, several biological d...
Systems-level analysis of genome wide association study results for a pilot juvenile idiopathic arthritis family study
Aydın Son, Yeşim; Demirkaya, Erkan; BİLGİNER, YELDA; KASAPÇOPUR, Özgür; Unsal, Erbil; ALİKAŞİFOĞLU, MEHMET; ÖZEN, SEZA (2015-01-01)
Genome wide association studies (GWAS) determine susceptibility profiles for complex diseases. In this study, GWAS was performed in 26 patients with oligo and rheumatoid factor negative polyarticular juvenile idiopathic artritis (JIA) and their healthy parents by Affymetrix 250K SNP arrays. Biological function and pathway enrichment analysis was done. This is the first GWAS reported for JIA families from the eastern Mediterranean population. Enrichment of Fc gamma R-mediated phagocytosis pathway and respons...
Network structure based pathway enrichment system to analyze pathway activities
Işık, Zerrin; Atalay, Mehmet Volkan; Atalay, Rengül; Department of Computer Engineering (2011)
Current approaches integrating large scale data and information from a variety of sources to reveal molecular basis of cellular events do not adequately benefit from pathway information. Here, we portray a network structure based pathway enrichment system that fuses and exploits model and data: signalling pathways are taken as the biological models while microarray and ChIP-seq data are the sample input data sources among many other alternatives. Our model- and data-driven hybrid system allows to quantitati...
Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering
Carkacioglu, Levent; Atalay, Rengül; KONU KARAKAYALI, ÖZLEN; Atalay, Mehmet Volkan; Can, Tolga (2010-01-01)
Due to the increase in gene expression data sets in recent years, various data mining techniques have been proposed for mining gene expression profiles. However, most of these methods target single gene expression data sets and cannot handle all the available gene expression data in public databases in reasonable amount of time and space. In this paper, we propose a novel framework, bi-k-bi clustering, for finding association rules of gene pairs that can easily operate on large scale and multiple heterogene...
Citation Formats
A. Boyraz, V. Pawlowsky-Glahn, J. Jose Egozcue, and A. C. Acar, “Principal microbial groups: compositional alternative to phylogenetic grouping of microbiome data,” BRIEFINGS IN BIOINFORMATICS, pp. 0–0, 2022, Accessed: 00, 2022. [Online]. Available: