Consensus clustering of time series data

Download
2014
Yetere Kurşun, Ayça
In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time series data so that the distance between them is minimized. However, DTW is a similarity measure that is employed for single variable with standard clustering methods rather than consensus clustering. Thus our motivation is to create an algorithm that can combine the benefits of the DTW with benefits of consensus clustering, which will also provide a solution for multivariate applications. We present the results of our study both with simulated data, well known datasets from the literature and Turkey’s long-term meteorological time series data between years 1950 and 2010. In all the cases we experimented with, when used with consensus clustering DTW performs better than Euclidian Distance measure. However in some cases the performance difference was insignificant, making it unnecessary to use both DTW and Consensus Clustering, due to time consuming computations. This thesis ends with a conclusion and the outlook to future studies.

Suggestions

Parallel computing in linear mixed models
Gökalp Yavuz, Fulya (Springer Science and Business Media LLC, 2020-09-01)
In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed me...
Time series classification using deep learningTime series classification using deep learning
Hatipoğlu, Poyraz Umut; İyigün, Cem; Department of Industrial Engineering (2016)
Deep learning is a fast-growing and interesting field due to the need to represent statistical data in a more complex and abstract way. Development in the processors and graphics processing unit technology effects undeniably that the deep networks get that popularity. The main purpose of this work is to develop robust and full functional time series classification method. To achieve this intent a deep learning based novel methods are proposed. Because time series data can have complex and variable structure...
Bayesian modelling for asymmetric multi-modal circular data
Kılıç, Muhammet Burak; Kalaylıoğlu Akyıldız, Zeynep Işıl; Sengupta, Ashis; Department of Statistics (2015)
In this thesis, we propose a Bayesian methodology based on sampling importance re-sampling for asymmetric and bimodal circular data analysis. We adopt Dirichlet process (DP) mixture model approach to analyse multi-modal circular data where the number of components is not known. For the analysis of temporal circular data, such as hourly measured wind directions, we join DP mixture model approach with circular times series modelling. The approaches are illustrated with both simulated and real life data sets. ...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
Multiresolution analysis of S&P500 time series
KILIC, Deniz Kenan; Uğur, Ömür (2018-01-01)
Time series analysis is an essential research area for those who are dealing with scientific and engineering problems. The main objective, in general, is to understand the underlying characteristics of selected time series by using the time as well as the frequency domain analysis. Then one can make a prediction for desired system to forecast ahead from the past observations. Time series modeling, frequency domain and some other descriptive statistical data analyses are the primary subjects of this study: i...
Citation Formats
A. Yetere Kurşun, “Consensus clustering of time series data,” M.S. - Master of Science, Middle East Technical University, 2014.