Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets

Date

2017-01-01

Author

PROSTBOUCLE, Adrien
PETROT, Frederic
LEROY, Vincent
Alemdar, Hande

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

221
views

0
downloads

Stream processing has become extremely popular for analyzing huge volumes of data for a variety of applications, including IoT, social networks, retail, and software logs analysis. Streams of data are produced continuously and are mined to extract patterns characterizing the data. A class of data mining algorithm, called generate-and-test, produces a set of candidate patterns that are then evaluated over data. The main challenges of these algorithms are to achieve high throughput, low latency, and reduced power consumption. In this article, we present a novel power-efficient, fast, and versatile hardware architecture whose objective is to monitor a set of target patterns to maintain their frequency over a stream of data. This accelerator can be used to accelerate data-mining algorithms, including itemsets and sequences mining. The massive fine-grain reconfiguration capability of field-programmable gate array (FPGA) technologies is ideal to implement the high number of pattern-detection units needed for these intensive data-mining applications. We have thus designed and implemented an IP that features high-density FPGA occupation and high working frequency. We provide detailed description of the IP internal micro-architecture and its actual implementation and optimization for the targeted FPGA resources. We validate our architecture by developing a co-designed implementation of the Apriori Frequent Itemset Mining (FIM) algorithm, and perform numerous experiments against existing hardware and software solutions. We demonstrate that FIM hardware acceleration is particularly efficient for large and low-density datasets (i.e., long-tailed datasets). Our IP reaches a data throughput of 250 million items/s and monitors up to 11.6k patterns simultaneously, on a prototyping board that overall consumes 24W in the worst case. Furthermore, our hardware accelerator remains generic and can be integrated to other generate and test algorithms.

Subject Keywords

Datamining, Frequent Itemset Mining, Sequence Mining, Stream Mining, FPGA Architecture, Hardware Accelerator, Apriori Algorithm

URI

https://hdl.handle.net/11511/40616

Journal

ACM Transactions on Reconfigurable Technology and Systems

DOI

https://doi.org/10.1145/302748

Collections

Department of Computer Engineering, Article

Suggestions

OpenMETU
Core

Optimization of an online course with web usage mining Akman, LE; Akkan, B; Baykal, Nazife (2004-02-18) The huge amount of information existing in the World Wide Web constitutes an ideal environment to implement data mining techniques. Web mining is the mining of web data. There are different applications of web mining: web content mining, web structure mining and web usage mining. In our study we analyzed an online course by web usage mining techniques in order to optimize the navigation paths, the duration of the time spend on each page and the number of visits throughout the semester of the course. Moreove...
A visual programming framework for distributed Internet of Things centric complex event processing Gökalp, Mert Onuralp; Koçyiğit, Altan; Eren, Pekin Erhan (2019-03-01) Complex Event Processing (CEP) is a promising approach for real-time processing of big data streams originating from Internet of Things (IoT) devices. Even though scalability and flexibility are key issues for IoT applications, current studies are mostly based on centralized solutions and restrictive query languages. Moreover, development, deployment and operation of big-data applications require significant amount of technical expertise. Hence, a framework that provides a higher abstraction level programmi...
Efficient User Grouping for Hybrid Beamforming in Single Carrier Wideband Massive MIMO Channels Kilcioglu, Emre; Güvensen, Gökhan Muzaffer (2021-01-01) In this paper, three types of user grouping algorithms in which our own performance metric is utilized are investigated for single carrier downlink wideband spatially correlated massive MIMO channels by using hybrid beamforming structure motivated by the joint spatial division and multiplexing (JSDM) framework. The user grouping procedure consists of two stages. Internally, our own metric called as the achievable information rate (AIR) is calculated given a user grouping input by considering both inter-grou...
Efficient computation of strong partial transitive-closures Toroslu, İsmail Hakkı (null; 1993-01-01) The development of efficient algorithms to process the different forms of the transitive-closure (TC) queries within the context of large database systems has recently attracted a large volume of research efforts. In this paper, we present a new algorithm suitable for processing one of these forms, the so called strong partially-instantiated, in which one of the query's argument is instantiated to a set of constants and the processing of which yields a set of tuples that draw their values form both of the q...
Mining eyetracking data to characterise users and theirpatterns of use Öder, Melih; Betin Can, Aysu; Department of Information Systems (2019) Eye tracking studies typically collect an enormous amount of data that encodes a lot of information about the users’ behavior and characteristics on the web. However, there are not many studies that mine such data to learn and discover user characteristics and proﬁles. The main goal of this study is to mine eye tracking data by machine learning methods to create data models which characterise users and predict their characteristics, in particular, familiarity and gender. Detecting users’ characteristics can...

Citation Formats

A. PROSTBOUCLE, F. PETROT, V. LEROY, and H. Alemdar, “Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets,” ACM Transactions on Reconfigurable Technology and Systems, pp. 1–25, 2017, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/40616.