K-median clustering algorithms for time series data

Download
2021-3-10
Gökçem, Yiğit
Clustering is an unsupervised learning method, that groups the unlabeled data forgathering valuable information. Clustering can be applied on various types of data. Inthis study, we have focused on time series clustering. When the studies about timeseries clustering are reviewed in the literature, for the time series data, the centers ofthe formed clusters are selected from the existing time series samples in the clusters.In this study, we have changed that view and have proposed clustering algorithmsbased on the idea of selecting the cluster centers for each timestamp. With this view,we aim to improve the clustering performance. Based on this idea four differentalgorithms are suggested that are called as Center Based K-Median Algorithm (CKM),CKM with Haar Wavelet decomposition, CKM with Haar Wavelet DecompositionWithout Projection and Search Based CKM with Haar Wavelet Decomposition.In the first algorithm, the raw data is used and the clustering problem is solved by theproposed optimization model. The other three algorithms are also solved by using theproposed optimization model and instead of using raw data, transformed data, whichthe Haar wavelet decomposition is applied to, is used. The proposed algorithms havebeen experimented on different data sets and evaluated by using different internal and external indices. Due to the evaluations, successful results are obtained regardingclustering performances of the CKM based algorithms.

Suggestions

Consensus clustering of time series data
Yetere Kurşun, Ayça; Batmaz, İnci; İyigün, Cem; Department of Scientific Computing (2014)
In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time serie...
Kernel probabilistic distance clustering algorithms
Özkan, Dilay; İyigün, Cem; Department of Industrial Engineering (2022-7)
Clustering is an unsupervised learning method that groups data considering the similarities between objects (data points). Probabilistic Distance Clustering (PDC) is a soft clustering approach based on some principles. Instead of directly assigning an object to a cluster, it assigns them to clusters with a membership probability. PDC is a simple yet effective clustering algorithm that performs well on spherical-shaped and linearly separable data sets. Traditional clustering algorithms fail when the data ...
A Multi-objective approach to cluster ensemble selection problem
Aktaş, Dilay; Lokman, Banu; Department of Operational Research (2019)
Clustering is an unsupervised learning method that partitions a data set into groups. The aim is to assign similar points to the same cluster and dissimilar points to different clusters with respect to some notion of similarity. It is applicable to a wide range of areas such as recommender systems, anomaly detection, market research, and customer segmentation. With the advances in the computational power, a diverse set of clustering solutions can be obtained from a dataset using different clustering algorit...
A memetic algorithm for clustering with cluster based feature selection
Şener, İlyas Alper; İyigün, Cem; Department of Operational Research (2022-8)
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of fe...
TEMPORAL CLUSTERING OF MULTIVARIATE TIME SERIES
Aslan, Sipan; Yozgatlıgil, Ceylan; İyigün, Cem; Department of Statistics (2022-2-07)
Clustering of real-valued time series is a prevalent problem that frequently emerges in various fields and applications. While clustering of univariate time series is very much examined, clustering of multivariate time series has not been extensively addressed. This dissertation considers the clustering of real-valued multivariate time series data. When the data analyzed in the clustering task are time series, the time dependencies of the time series and the clusters to be formed should be considered togeth...
Citation Formats
Y. Gökçem, “K-median clustering algorithms for time series data,” M.S. - Master of Science, Middle East Technical University, 2021.