Online embedding and clustering of data streams

2019-11-20
Zubaroǧlu, Alaettin
Atalay, Mehmet Volkan
© 2019 Association for Computing Machinery.Number of connected devices is steadily increasing and these devices continuously generate data streams. These data streams are often high dimensional and contain concept drift. Real-time processing of data streams is arousing interest despite many challenges. Clustering is a method that does not need labeled instances (it is unsupervised) and it can be applied with less prior information about the data. These properties make clustering one of the most suitable methods for real-time data stream processing. Moreover, data embedding is a process that may simplify clustering and makes visualization of high dimensional data possible. There exist several data stream clustering algorithms in the literature, however no data stream embedding method exists. UMAP is a data embedding algorithm that is suitable to be applied on data streams, but it cannot adopt concept drift. In this study, we have developed a new method to apply UMAP on data streams, adopt concept drift and cluster embedded data instances using any distance based clustering algorithms.

Suggestions

SWARM-based data delivery framework in the Ad Hoc Internet of Things
Hasan, Mohammed Zaki; Al-Turjman, Fadi (2017-12-08)
Internet of Things (IoTs) refers to the rapidly growing network of connected objects that are able to collect and exchange data using embedded sensors. To guarantee the connectivity among these objects and devices, fault tolerant routing has been received a significant attention in recent years. In this paper, we propose a bio-inspired particle multi-swarm optimization (PMSO) routing algorithm to construct, recover and select k-disjoint paths that tolerates the failure while satisfying quality of service (Q...
Optimizing Multipath Routing With Guaranteed Fault Tolerance in Internet of Things
Hasan, Mohammed Zaki; Al-Turjman, Fadi (2017-10-01)
Internet of Things (IoTs) refers to the rapidly growing network of connected objects and people that are able to collect and exchange data using embedded sensors. To guarantee the connectivity among these objects and people, fault tolerance routing has to be significantly considered. In this paper, we propose a bio-inspired particle multi-swarm optimization (PMSO) routing algorithm to construct, recover, and select k-disjoint paths that tolerates the failure while satisfying the quality of service parameter...
Real-time intrusion detection and prevention system for SDN-based IoT networks
Sarıça, Alper Kaan; Angın, Pelin; Department of Computer Engineering (2021-9)
The significant advances in wireless networks with the 5G networks have made possible a variety of new IoT use cases. 5G and beyond networks will significantly rely on network virtualization technologies such as SDN and NFV. The prevalence of IoT and the large attack surface it has created calls for SDN-based intelligent security solutions that achieve real-time, automated intrusion detection and mitigation. In this thesis, we propose a real-time intrusion detection and mitigation system for SDN, which aims...
Data stream clustering: a review
Zubaroglu, Alaettin; Atalay, Mehmet Volkan (Springer Science and Business Media LLC, 2020-07-01)
Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-time data stream processing, because it can be applied with less prior information about the data and it does not need labeled instances. However, data stream clustering differs from traditional clustering in many aspects and it has several challenging issues. Here, w...
Online embedding and clustering of evolving data streams
Zubaroglu, Alaettin; Atalay, Mehmet Volkan (2022-07-01)
Number of connected devices is steadily increasing and this trend is expected to continue in the near future. Connected devices continuously generate data streams and the data streams may often be high dimensional and contain concept drift. Clustering is one of the most suitable methods for real-time data stream processing, since clustering can be applied with less prior information about the data. Also, data embedding makes the visualization of high dimensional data possible and may simplify clustering pro...
Citation Formats
A. Zubaroǧlu and M. V. Atalay, “Online embedding and clustering of data streams,” 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/57108.