CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction

Date

2015-10-1

Author

Alkan, Oznur Kirmemis
Karagöz, Pınar

Metadata

Show full item record

Item Usage Stats

388
views

0
downloads

High utility sequential pattern mining has been considered as an important research problem and a number of relevant algorithms have been proposed for this topic. The main challenge of high utility sequential pattern mining is that, the search space is large and the efficiency of the solutions is directly affected by the degree at which they can eliminate the candidate patterns. Therefore, the efficiency of any high utility sequential pattern mining solution depends on its ability to reduce this big search space, and as a result, lower the computational complexity of calculating the utilities of the candidate patterns. In this paper, we propose efficient data structures and pruning technique which is based on Cumulated Rest of Match (CRoM) based upper bound. CRoM, by defining a tighter upper bound on the utility of the candidates, allows more conservative pruning before candidate pattern generation in comparison to the existing techniques. In addition, we have developed an efficient algorithm, High Utility Sequential Pattern Extraction (HuspExt), which calculates the utilities of the child patterns based on that of the parents'. Substantial experiments on both synthetic and real datasets from different domains show that, the proposed solution efficiently discovers high utility sequential patterns from large scale datasets with different data characteristics, under low utility thresholds.

Subject Keywords

High utility sequential pattern mining, Efficiency, Candidate pattern pruning, Sequential pattern mining

URI

https://hdl.handle.net/11511/28401

Journal

IEEE Transactions on Knowledge and Data Engineering

DOI

https://doi.org/10.1109/tkde.2015.2420557

Collections

Department of Computer Engineering, Article

Suggestions

OpenMETU
Core

Consensus clustering of time series data Yetere Kurşun, Ayça; Batmaz, İnci; İyigün, Cem; Department of Scientific Computing (2014) In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time serie...
CRoM and HuspExt Improving efficiency of high utility sequential pattern extraction Kirmemis Alkan, Öznur; Karagöz, Pınar (2016-05-10) This paper presents efficient data structures and a pruning technique in order to improve the efficiency of high utility sequential pattern mining. CRoM (Cumulated Rest of Match) based upper bound, which is a tight upper bound on the utility of the candidates is proposed in order to perform more conservative pruning before candidate pattern generation in comparison to the existing techniques. In addition, an efficient algorithm, HuspExt (High Utility Sequential Pattern Extraction), is presented which calcul...
Frequent itemset minning with trie data structure and parallel execution with PVM Guner, Levent; Karagöz, Pınar (2007-10-03) Apriori algorithm is one of the basic algorithms introduced to solve the problem of frequent itemset mining (FIM). Since there is a new generation of affordable computers with parallel processing capability and it is easier to set up computer clusters, we can develop more efficient parallel FIM algorithms for these new systems. This paper investigates the use of trie data structure in parallel execution of Apriori algorithm, the potential problems during implementation, performance comparison of several par...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03) In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...

Citation Formats

O. K. Alkan and P. Karagöz, “CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction,” IEEE Transactions on Knowledge and Data Engineering, pp. 2645–2657, 2015, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/28401.