Consensus clustering of time series data

Download
2014
Yetere Kurşun, Ayça
In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time series data so that the distance between them is minimized. However, DTW is a similarity measure that is employed for single variable with standard clustering methods rather than consensus clustering. Thus our motivation is to create an algorithm that can combine the benefits of the DTW with benefits of consensus clustering, which will also provide a solution for multivariate applications. We present the results of our study both with simulated data, well known datasets from the literature and Turkey’s long-term meteorological time series data between years 1950 and 2010. In all the cases we experimented with, when used with consensus clustering DTW performs better than Euclidian Distance measure. However in some cases the performance difference was insignificant, making it unnecessary to use both DTW and Consensus Clustering, due to time consuming computations. This thesis ends with a conclusion and the outlook to future studies.

Suggestions

Parallel computing in linear mixed models
Gökalp Yavuz, Fulya (Springer Science and Business Media LLC, 2020-09-01)
In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed me...
End User Evaluation of the FAIR4Health Data Curation Tool
Gencturk, Mert; Teoman, Alper; Alvarez-Romero, Celia; Martinez-Garcia, Alicia; Parra-Calderon, Carlos Luis; Poblador-Plou, Beatriz; Löbe, Matthias; Sinaci, A Anil (2021-05-27)
The aim of this study is to build an evaluation framework for the user-centric testing of the Data Curation Tool. The tool was developed in the scope of the FAIR4Health project to make health data FAIR by transforming them from legacy formats into a Common Data Model based on HL7 FHIR. The end user evaluation framework was built by following a methodology inspired from the Delphi method. We applied a series of questionnaires to a group of experts not only in different roles and skills, but also from various...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
Multiresolution analysis of S&P500 time series
KILIC, Deniz Kenan; Uğur, Ömür (2018-01-01)
Time series analysis is an essential research area for those who are dealing with scientific and engineering problems. The main objective, in general, is to understand the underlying characteristics of selected time series by using the time as well as the frequency domain analysis. Then one can make a prediction for desired system to forecast ahead from the past observations. Time series modeling, frequency domain and some other descriptive statistical data analyses are the primary subjects of this study: i...
Time series classification with feature covariance matrices
Ergezer, Hamza; Leblebicioğlu, Mehmet Kemal (2018-06-01)
In this work, a novel approach utilizing feature covariance matrices is proposed for time series classification. In order to adapt the feature covariance matrices into time series classification problem, a feature vector is defined for each point in a time series. The feature vector comprises local and global information such as value, derivative, rank, deviation from the mean, the time index of the point and cumulative sum up to the point. Extracted feature vectors for the time instances are concatenated t...
Citation Formats
A. Yetere Kurşun, “Consensus clustering of time series data,” M.S. - Master of Science, Middle East Technical University, 2014.