Data stream clustering: a review

2020-07-01
Zubaroglu, Alaettin
Atalay, Mehmet Volkan
Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-time data stream processing, because it can be applied with less prior information about the data and it does not need labeled instances. However, data stream clustering differs from traditional clustering in many aspects and it has several challenging issues. Here, we provide information regarding the concepts and common characteristics of data streams, such as concept drift, data structures for data streams, time window models and outlier detection. We comprehensively review recent data stream clustering algorithms and analyze them in terms of the base clustering technique, computational complexity and clustering accuracy. A comparison of these algorithms is given along with still open problems. We indicate popular data stream repositories and datasets, stream processing tools and platforms. Open problems about data stream clustering are also discussed.
ARTIFICIAL INTELLIGENCE REVIEW

Suggestions

Online embedding and clustering of data streams
Zubaroǧlu, Alaettin; Atalay, Mehmet Volkan (2019-11-20)
© 2019 Association for Computing Machinery.Number of connected devices is steadily increasing and these devices continuously generate data streams. These data streams are often high dimensional and contain concept drift. Real-time processing of data streams is arousing interest despite many challenges. Clustering is a method that does not need labeled instances (it is unsupervised) and it can be applied with less prior information about the data. These properties make clustering one of the most suitable met...
Interactive visual user interfaces: A survey
Murtagh, F; Taşkaya Temizel, Tuğba; Contreras, P; Mothe, J; Englmeier, K (Springer Science and Business Media LLC, 2003-06-01)
Following a short survey of input data types on which to construct interactive visual user interfaces, we report on a new and recent implementation taking concept hierarchies as input data. The visual user interfaces express domain ontologies which are based on these concept hierarchies. We detail a web-based implementation, and show examples of usage. An appendix surveys related systems, many of them commercial.
Data sharing using mqtt and zigbee-based dds on resource-constrained contiki-based devices
Yıldırım, Tunahan; Oğuztüzün, Mehmet Halit S.; Department of Computer Engineering (2020)
This thesis describes the implementation of data sharing among resource-constrained IoT devices using two different middleware: MQTT(Message Queuing Telemetry Transport) and DDS (Data Distribution Services) for Real-Time Systems. In our setting, all IoT devices run under the Contiki operating system. In the configuration for DDS, a pair of Texas Instruments’ MSP430 processor-based ZigBee powered Advanticsys XM1000 device is used to realize data sharing between wireless sensor network devices without a serve...
Information-centric framework for the Internet of Things (IoT): Traffic modeling & optimization
Al-Turjman, Fadi (2018-03-01)
With the increased growth in number of connected devices, either static or mobile ones, there is a concurrent massive increase in the accompanied data traffic volume. Therefore, and for better future communication systems with better coverage and capacity performance, the information-centric Internet of Things (IoT), is a prudent option. In this IoT paradigm, populating and reallocating Information Repeaters (IRs) is one promising way in reducing data traffic during the peak periods. Accordingly, a novel pl...
AI for dynamic packet size optimization of batteryless IoT nodes: a case study for wireless body area sensor networks
Tabrizi, Hamed Osouli; Al-Turjman, Fadi (Springer Science and Business Media LLC, 2020-10-01)
Packet size optimization, with the purpose of minimizing the wireless packet transmission energy consumption, is crucial for the energy efficiency of the Internet of Things nodes. Meanwhile, energy scavenging from ambient energy sources has gained a significant attraction to avoid battery issues as the number of nodes increasingly grows. Packet size optimization algorithms have so far been proposed for battery-powered networks that have limited total energy with continuous power availability to prolong thei...
Citation Formats
A. Zubaroglu and M. V. Atalay, “Data stream clustering: a review,” ARTIFICIAL INTELLIGENCE REVIEW, pp. 0–0, 2020, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/48156.