Clustering of time-course gene expression data with dissimilar replicates

Download
2013
Çınar, Ozan
Clustering the genes with respect to their profile similarity leads to important results in bioinformatics. There are numerous model-based methods to cluster time-series. However, those methods may not be applicable to microarray gene expression data, since they provide short time-series which are not long enough for modeling. Moreover, distance measures used in clustering methods consider the dissimilarities based on only one characteristic and ignore the time-dependencies. Furthermore, genes may show differences among the replications which carry important information. Detecting interesting genes might involve heavy computational burden. In this study, a clustering method is proposed where every gene is accepted as a short time-series with several replications. The distance between the short time-series of replications is measured with the information coming from both the Euclidean distance and the slope distance. The numerical experiments show that the proposed approach can find the clusters very fast with a low percentage of misclassification. Several tests show that the method is also successive in detecting the genes with dissimilar replicates or constant shapes. Finally, different approaches are proposed for determining the number of clusters in a given data set. Simulation studies show that these methods are helpful to detect the number of clusters when it is not known a priori.