Clustering of time-course gene expression data with dissimilar replicates

Download

index.pdf

Date

2013

Author

Çınar, Ozan

Metadata

Show full item record

Item Usage Stats

152
views

121
downloads

Clustering the genes with respect to their profile similarity leads to important results in bioinformatics. There are numerous model-based methods to cluster time-series. However, those methods may not be applicable to microarray gene expression data, since they provide short time-series which are not long enough for modeling. Moreover, distance measures used in clustering methods consider the dissimilarities based on only one characteristic and ignore the time-dependencies. Furthermore, genes may show differences among the replications which carry important information. Detecting interesting genes might involve heavy computational burden. In this study, a clustering method is proposed where every gene is accepted as a short time-series with several replications. The distance between the short time-series of replications is measured with the information coming from both the Euclidean distance and the slope distance. The numerical experiments show that the proposed approach can find the clusters very fast with a low percentage of misclassification. Several tests show that the method is also successive in detecting the genes with dissimilar replicates or constant shapes. Finally, different approaches are proposed for determining the number of clusters in a given data set. Simulation studies show that these methods are helpful to detect the number of clusters when it is not known a priori.

Subject Keywords

Gene expression., Biometry., Cluster analysis., Time-series analysis., Bioinformatics.

URI

http://etd.lib.metu.edu.tr/upload/12615975/index.pdf
https://hdl.handle.net/11511/22658

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Consensus clustering of time series data Yetere Kurşun, Ayça; Batmaz, İnci; İyigün, Cem; Department of Scientific Computing (2014) In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time serie...
Identification of functionally orthologous protein groups in different species based on protein network alignment Yaveroğlu, Ömer Nebil; Can, Tolga; Department of Computer Engineering (2010) In this study, an algorithm named ClustOrth is proposed for determining and matching functionally orthologous protein clusters in different species. The algorithm requires protein interaction networks of the organisms to be compared and GO terms of the proteins in these interaction networks as prior information. After determining the functionally related protein groups using the Repeated Random Walks algorithm, the method maps the identified protein groups according to the similarity metric defined. In orde...
Integer linear programming based solutions for construction of biological networks Eren Özsoy, Öykü; Can, Tolga; Department of Health Informatics (2014) Inference of gene regulatory or signaling networks from perturbation experiments and gene expression assays is one of the challenging problems in bioinformatics. Recently, the inference problem has been formulated as a reference network editing problem and it has been show that finding the minimum number of edit operations on a reference network in order to comply with perturbation experiments is an NP-complete problem. In this dissertation, we propose linear programming based solutions for reconstruction o...
Cluster based model diagnostic for logistic regression Tanju, Özge; Kalaylıoğlu Akyıldız, Zeynep Işıl; Department of Statistics (2016) Model selection methods are commonly used to identify the best approximation that explains the data. Existing methods are generally based on the information theory, such as Akaike Information Criterion (AIC), corrected Akaike Information Criterion (AICc), Consistent Akaike Information Criterion (CAIC), and Bayesian Information Criterion (BIC). These criteria do not depend on any modeling purposes. In this thesis, we propose a new method for logistic regression model selection where the modeling purpose is c...
GMDH-type neural network algorithms for short term forecasting Dağ, Osman; Yozgatlıgil, Ceylan; Department of Statistics (2015) Group Method of Data Handling (GMDH) - type neural network algorithms are the heuristic self-organization method for modelling the complex systems. GMDH algorithms are utilized for the variety of purposes, which are identification of physical laws, extrapolation of physical fields, pattern recognition, clustering, approximation of multidimensional processes, forecasting without models and so on. In this study, GMDH - type neural network algorithms were applied to make forecasts for time series data sets. We...

Citation Formats

O. Çınar, “Clustering of time-course gene expression data with dissimilar replicates,” M.S. - Master of Science, Middle East Technical University, 2013.