Consensus clustering of time series data

Download

index.pdf

Date

2014

Author

Yetere Kurşun, Ayça

Metadata

Show full item record

Item Usage Stats

370
views

130
downloads

In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time series data so that the distance between them is minimized. However, DTW is a similarity measure that is employed for single variable with standard clustering methods rather than consensus clustering. Thus our motivation is to create an algorithm that can combine the benefits of the DTW with benefits of consensus clustering, which will also provide a solution for multivariate applications. We present the results of our study both with simulated data, well known datasets from the literature and Turkey’s long-term meteorological time series data between years 1950 and 2010. In all the cases we experimented with, when used with consensus clustering DTW performs better than Euclidian Distance measure. However in some cases the performance difference was insignificant, making it unnecessary to use both DTW and Consensus Clustering, due to time consuming computations. This thesis ends with a conclusion and the outlook to future studies.

Subject Keywords

Time-series analysis., Data clustering.

URI

http://etd.lib.metu.edu.tr/upload/12616903/index.pdf
https://hdl.handle.net/11511/23340

Collections

Graduate School of Applied Mathematics, Thesis

Suggestions

OpenMETU
Core

Temporal clustering of time series via threshold autoregressive models: application to commodity prices Aslan, Sipan; Yozgatlıgil, Ceylan; İyigün, Cem (2018-01-01) The primary aim in this study is grouping time series according to the similarity between their data generating mechanisms (DGMs) rather than comparing pattern similarities in the time series trajectories. The approximation to the DGM of each series is accomplished by fitting the linear autoregressive and the non-linear threshold autoregressive models, and outputs of the estimates are used for feature extraction. Threshold autoregressive models are recognized for their ability to represent nonlinear feature...
Bayesian modelling for asymmetric multi-modal circular data Kılıç, Muhammet Burak; Kalaylıoğlu Akyıldız, Zeynep Işıl; Sengupta, Ashis; Department of Statistics (2015) In this thesis, we propose a Bayesian methodology based on sampling importance re-sampling for asymmetric and bimodal circular data analysis. We adopt Dirichlet process (DP) mixture model approach to analyse multi-modal circular data where the number of components is not known. For the analysis of temporal circular data, such as hourly measured wind directions, we join DP mixture model approach with circular times series modelling. The approaches are illustrated with both simulated and real life data sets. ...
Time series classification with feature covariance matrices Ergezer, Hamza; Leblebicioğlu, Mehmet Kemal (2018-06-01) In this work, a novel approach utilizing feature covariance matrices is proposed for time series classification. In order to adapt the feature covariance matrices into time series classification problem, a feature vector is defined for each point in a time series. The feature vector comprises local and global information such as value, derivative, rank, deviation from the mean, the time index of the point and cumulative sum up to the point. Extracted feature vectors for the time instances are concatenated t...
Parallel computing in linear mixed models Gökalp Yavuz, Fulya (Springer Science and Business Media LLC, 2020-09-01) In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed me...
K-median clustering algorithms for time series data Gökçem, Yiğit; İyigün, Cem; Department of Industrial Engineering (2021-3-10) Clustering is an unsupervised learning method, that groups the unlabeled data forgathering valuable information. Clustering can be applied on various types of data. Inthis study, we have focused on time series clustering. When the studies about timeseries clustering are reviewed in the literature, for the time series data, the centers ofthe formed clusters are selected from the existing time series samples in the clusters.In this study, we have changed that view and have proposed clustering algorithmsbased...

Citation Formats

A. Yetere Kurşun, “Consensus clustering of time series data,” M.S. - Master of Science, Middle East Technical University, 2014.