Density-Based and parameterless clustering of embedded data streams

Download
2021-9-09
Poyraz, Özlem
With the accelerating digitalization of the world, the amount of high-speed data produced increases rapidly, and it is difficult to record and collectively process such a data-stream. This creates the need for processing as soon as it arrives without recording the data stream. Mostly, there is no prior information about data. Additionally, characteristics of data streams may change over time; this phenomenon is called concept drift. Since clustering works without actual labels, it is suitable to be used on data streams. Clustering algorithms for data streams should read the data only once, work in real-time, and adapt to the concept drift. With Density-Based and Parameterless Clustering of Embedded Data Streams (DBPCES) algorithm developed in this study, data streams are embedded into two dimensions and clustered with a parameterless density-based clustering algorithm. To embed the data stream into 2-dimensions, UMAP algorithm was adapted to handle data streams and concept drift. For clustering, DBSCAN algorithm was used on embedded data points. DBSCAN parameters were estimated with a heuristic so that data stream can be clustered without requiring any data-dependent parameters from the user. DBPCES algorithm was run on synthetic and real data streams that differ in actual cluster count, dimension count, and concept drift rate. The performance of DBPCES was compared with DenStream and implementation of Zubaroğlu and Atalay. As evaluation metrics, adjusted rand index, purity, and silhouette coefficient were used. Additionally, execution times were compared as well. Although DBPCES was not as fast as DenStream, it achieved similar results with other algorithms.

Suggestions

GELECEĞİN KURULUŞLARI İÇİN BÜYÜK VERİ MEVCUT DURUM VE EĞİLİMLER
Kayabay, Kerem; Gökalp, Mert Onuralp; Eren, Pekin Erhan; Koçyiğit, Altan (null; 2016-10-06)
Exponential growth in data volume originating from Internet of Thingssources and information services drives the industry to develop new models and distributed tools to handle big data. In order to achieve strategic advantages, effective use of these tools and integrating results to their business processes are critical for enterprises. While there is an abundance of tools available in the market, they are underutilized by organizations due to their complexities. Deployment and usage of big data analysis ...
Optimal streaming of rate adaptable video
Gürses, Eren; Akar, Gözde; Department of Electrical and Electronics Engineering (2006)
In this study, we study the dynamics of network adaptive video streaming and propose novel algorithms for rate distortion control in video streaming. While doing so, we maintain inter-protocol fairness with TCP (Transmission Control Protocol) that is the dominant transport protocol in the current Internet. The proposed algorithms are retransmission-based and necessitate the use of playback buffers in order to tolerate the extra latency introduced by retransmissions. In the first part, we propose a practical...
Estimation of depth fields suitable for video compression based on 3-D structure and motion of objects
Alatan, Abdullah Aydın (Institute of Electrical and Electronics Engineers (IEEE), 1998-6)
Intensity prediction along motion trajectories removes temporal redundancy considerably in video compression algorithms. In three-dimensional (3-D) object-based video coding, both 3-D motion and depth values are required for temporal prediction. The required 3-D motion parameters for each object are found by the correspondence-based E-matrix method. The estimation of the correspondences-two-dimensional (2-D) motion field-between the frames and segmentation of the scene into objects are achieved simultaneous...
Interharmonics analysis of power signals with fundamental frequency deviation using Kalman filtering
Köse, Neslihan; Salor, Oezguel; Leblebicioğlu, Mehmet Kemal (2010-09-01)
In this paper a spectral decomposition-based method for interharmonic computation is proposed for power systems where the fundamental frequency fluctuates significantly. In the proposed method, the frequency domain components of the voltage waveform are obtained by Kalman filtering. Both the system fundamental frequency and the correct spectrum of the voltage waveform, and hence the exact interharrnonics are obtained. The proposed method is tested with both simulated and field data obtained from different e...
Protocol and connectivity based overlay level capacity calculation of P2P networks
Oztoprak, Kasim; Kilic, Hurevren (2006-12-22)
In this paper, we proposed a metric for P2P networks based on Shannon's L-channel capacity calculation idea. The metric calculates the maximum rate of information (in bits per second) that can be transmitted over P2P network (a.k.a. combinatorial capacity) caused by protocol and overlay-level connectivity. We suggest P2P systems to be modeled as a discrete noiseless channel on which the protocol together with dynamically changing overlay-level instant connectivity topology defines a Shannon Language. In exp...
Citation Formats
Ö. Poyraz, “Density-Based and parameterless clustering of embedded data streams,” M.S. - Master of Science, Middle East Technical University, 2021.