A New WAP-tree based sequential pattern mining algorithm for faster pattern extraction

Download
2012
Önal, Kezban Dilek
Sequential pattern mining constitutes a basis for solution of problems in various domains like bio-informatics and web usage mining. Research on this field continues seeking faster algorithms. WAP-Tree based algorithms that emerged from web usage mining literature have shown a remarkable performance on single-item sequence databases. In this study, we investigated application of WAP-Tree based mining to multi-item sequential pattern mining and we designed an extension of WAP-Tree data structure for multi-item sequence databases, the MULTI-WAP-Tree. In addition, we propose a new mining strategy on WAP-Tree which involves a hybrid traversal strategy in possible sequences search space and a new early prunning idea called Sibling Principle on Pattern Tree. Two algorithms, FOF-PT and MULTI-FOF-PT, applying this strategy on WAP-Tree and MULTI-WAP-Tree respectively, are developed. Experiments showed that FOF-PT outperforms both other WAP-Tree based algorithms and PrefixSpan in terms of execution time. Moreover, experimental results revealed MULTI-FOF-PT finds patterns faster than PrefixSpan on dense multi-item sequence databases with small alphabets.

Suggestions

Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle
Onal, Kezban Dilek; Karagöz, Pınar (2014-06-04)
Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from th...
An ilp-based concept discovery system for multi-relational data mining
Kavurucu, Yusuf; Karagöz, Pınar; Department of Computer Engineering (2009)
Multi Relational Data Mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. However, as patterns involve multiple relations, the search space of possible hypothesis becomes intractably complex. In order to cope with this problem, several relational knowledge discovery systems have been developed employing various search strategies, heuristics and language pattern limitations. In this thesis, Induct...
New transitive closure algorithm for recursive query processing in deductive databases
Toroslu, İsmail Hakkı (1992-01-01)
© 1992 IEEE.The development of effic1.e11t algorithms to process the different forms of the transitive-closure (TC) queries within the context of large database systems has recently attracted a large amom1t of research efforts. In this paper, we present a neic algorithm suitable for full transitive closure problem, which zs used to solve uninstentiated recursive qi1enes in deductive databases. In this new algorithm there are two phases. In the first phase a general graph is condensed into an acyclic graph a...
A feature-based hybrid ARIMA-ANN model for univariate time series forecasting
Buyuksahin, Umit Cavus; Ertekin Bolelli, Şeyda (2020-01-01)
High prediction accuracies at time series modeling and forecasting is of the utmost importance for a variety of application domains. Many methods have been proposed in the literature to improve time series forecasting accuracy. Those which focus on univariate time series forecasting methods use only the values in the prior time steps to predict the next value. In this study in addition to the historical values, it is aimed to increase the forecasting performance by using extra statistical and structural fea...
A Hybrid Approach to Process Mining: Finding Immediate Successors of a Process by Using From-To Chart
Esgin, Eren; Karagöz, Pınar (2009-12-15)
Process mining is a branch of data mining that aims to discover process model from the event logs. In this study, we propose a hybrid approach to process mining in such a way that, "from-to chart" is used as the front-end to monitor the transitions among activities of a realistic event log. Another novelty of this study is developed evaluation metrics, which are used for finding immediate successors in order to convert these raw relations into dependency/frequency graph.
Citation Formats
K. D. Önal, “A New WAP-tree based sequential pattern mining algorithm for faster pattern extraction,” M.S. - Master of Science, Middle East Technical University, 2012.