An evaluation of a novel approach for clustering genes with dissimilar replicates

2020-12-01
Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly.
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION

Suggestions

Long-tailed graphical model and frequentist inference of the model parameters for biological networks
AĞRAZ, MELİH; Purutçuoğlu Gazi, Vilda (Informa UK Limited, 2020-03-12)
The biological organism is a complex structure regulated by interactions of genes and proteins. Various linear and nonlinear models can define activations of these interactions. In this study, we have aimed to improve the Gaussian graphical model (GGM), which is one of the well-known probabilistic and parametric models describing steady-state activations of biological systems, and its inference based on the graphical lasso, shortly Glasso, method. Because, GGM with Glasso can have low accuracy when the syst...
A new outlier detection method based on convex optimization: application to diagnosis of Parkinson's disease
TAYLAN, PAKİZE; Yerlikaya-Ozkurt, Fatma; Bilgic Ucak, Burcu; Weber, Gerhard Wilhelm (Informa UK Limited, 2020-12-01)
Neuroscience is a combination of different scientific disciplines which investigate the nervous system for understanding of the biological basis. Recently, applications to the diagnosis of neurodegenerative diseases like Parkinson's disease have become very promising by considering different statistical regression models. However, well-known statistical regression models may give misleading results for the diagnosis of the neurodegenerative diseases when experimental data contain outlier observations that l...
Extended lasso-type MARS (LMARS) model in the description of biological network
Agraz, Melih; Purutçuoğlu Gazi, Vilda (Informa UK Limited, 2019-01-02)
The multivariate adaptive regression splines (MARS) model is one of the well-known, additive non-parametric models that can deal with highly correlated and nonlinear datasets successfully. From our previous analyses, we have seen that lasso-type MARS (LMARS) can be a strong alternative of the Gaussian graphical model (GGM) which is a well-known probabilistic method to describe the steady-state behaviour of the complex biological systems via the lasso regression. In this study, we extend our original LMARS m...
Modern tools for the time-discrete dynamics and optimization of gene-environment networks
DEFTERLİ, ÖZLEM; Fuegenschuh, Armin; Weber, Gerhard Wilhelm (Elsevier BV, 2011-12-01)
In this study, we discuss the models of genetic regulatory systems, so-called gene-environment networks. The dynamics of such kind of systems are described by a class of time-continuous ordinary differential equations having a general form (E) over dot = M(E)E, where E is a vector of gene-expression levels and environmental factors and M(E) is the matrix having functional entries containing unknown parameters to be optimized. Accordingly, time-discrete versions of that model class are studied and improved b...
Modeling of various biological networks via LCMARS
AYYILDIZ DEMİRCİ, EZGİ; Purutçuoğlu Gazi, Vilda (Elsevier BV, 2018-09-01)
In system biology, the interactions between components such as genes, proteins, can be represented by a network. To understand the molecular mechanism of complex biological systems, construction of their networks plays a crucial role. However, estimation of these biological networks is a challenging problem because of their high dimensional and sparse structures. Several statistical methods are proposed to overcome this issue. The Conic Multivariate Adaptive Regression Splines (CMARS) is one of the recent n...
Citation Formats
O. Cinar, C. İyigün, and Ö. İlk Dağ, “An evaluation of a novel approach for clustering genes with dissimilar replicates,” COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, pp. 0–0, 2020, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/69949.