A New WAP-tree based sequential pattern mining algorithm for faster pattern extraction

Download
2012
Önal, Kezban Dilek
Sequential pattern mining constitutes a basis for solution of problems in various domains like bio-informatics and web usage mining. Research on this field continues seeking faster algorithms. WAP-Tree based algorithms that emerged from web usage mining literature have shown a remarkable performance on single-item sequence databases. In this study, we investigated application of WAP-Tree based mining to multi-item sequential pattern mining and we designed an extension of WAP-Tree data structure for multi-item sequence databases, the MULTI-WAP-Tree. In addition, we propose a new mining strategy on WAP-Tree which involves a hybrid traversal strategy in possible sequences search space and a new early prunning idea called Sibling Principle on Pattern Tree. Two algorithms, FOF-PT and MULTI-FOF-PT, applying this strategy on WAP-Tree and MULTI-WAP-Tree respectively, are developed. Experiments showed that FOF-PT outperforms both other WAP-Tree based algorithms and PrefixSpan in terms of execution time. Moreover, experimental results revealed MULTI-FOF-PT finds patterns faster than PrefixSpan on dense multi-item sequence databases with small alphabets.

Suggestions

An ilp-based concept discovery system for multi-relational data mining
Kavurucu, Yusuf; Karagöz, Pınar; Department of Computer Engineering (2009)
Multi Relational Data Mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. However, as patterns involve multiple relations, the search space of possible hypothesis becomes intractably complex. In order to cope with this problem, several relational knowledge discovery systems have been developed employing various search strategies, heuristics and language pattern limitations. In this thesis, Induct...
Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle
Onal, Kezban Dilek; Karagöz, Pınar (2014-06-04)
Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from th...
A feature-based hybrid ARIMA-ANN model for univariate time series forecasting
Buyuksahin, Umit Cavus; Ertekin Bolelli, Şeyda (2020-01-01)
High prediction accuracies at time series modeling and forecasting is of the utmost importance for a variety of application domains. Many methods have been proposed in the literature to improve time series forecasting accuracy. Those which focus on univariate time series forecasting methods use only the values in the prior time steps to predict the next value. In this study in addition to the historical values, it is aimed to increase the forecasting performance by using extra statistical and structural fea...
Discovering more accurate frequent web usage patterns
Bayır, Murat Ali; Toroslu, İsmail Hakkı; Coşar, Ahmet; Fidan, Güven (2008-09-01)
Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web users. As in classical data mining, data preparation and pattern discovery are the main issues in web usage mining. The first phase of web usage mining is the data processing phase, which includes the session reconstruction operation from server logs. Session reconstruction success directly affects the quality of the frequent patterns discovered in the n...
A new likelihood approach to autonomous multiple model estimation
Söken, Halil Ersin (Elsevier BV, 2020-04-01)
This paper presents an autonomous multiple model (AMM) estimation algorithm for hybrid systems with sudden changes in their parameters. Estimates of Kalman filters (KFs) that are tuned and employed for different system modes are merged based on a newly defined likelihood function without any necessity for filter interaction. The proposed likelihood function is composed of two measures, the filter agility measure and the steady-state error measure. These measures are derived based on filter adaptation rules....
Citation Formats
K. D. Önal, “A New WAP-tree based sequential pattern mining algorithm for faster pattern extraction,” M.S. - Master of Science, Middle East Technical University, 2012.