Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle

2014-06-04
Onal, Kezban Dilek
Karagöz, Pınar
Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from the literature and FOF (First Occurrence Forest) strategy. Experimental results revealed that FOF-SP finds patterns faster than previous WAP-Tree based algorithms PLWAP and FOF. Moreover, FOF-SP can mine patterns faster than PrefixSpan and as fast as LAPIN on real sequence databases from web usage mining and bioinformatics.

Suggestions

A New WAP-tree based sequential pattern mining algorithm for faster pattern extraction
Önal, Kezban Dilek; Şenkul, Pınar; Department of Computer Engineering (2012)
Sequential pattern mining constitutes a basis for solution of problems in various domains like bio-informatics and web usage mining. Research on this field continues seeking faster algorithms. WAP-Tree based algorithms that emerged from web usage mining literature have shown a remarkable performance on single-item sequence databases. In this study, we investigated application of WAP-Tree based mining to multi-item sequential pattern mining and we designed an extension of WAP-Tree data structure for multi-it...
A Hybrid Approach to Process Mining: Finding Immediate Successors of a Process by Using From-To Chart
Esgin, Eren; Karagöz, Pınar (2009-12-15)
Process mining is a branch of data mining that aims to discover process model from the event logs. In this study, we propose a hybrid approach to process mining in such a way that, "from-to chart" is used as the front-end to monitor the transitions among activities of a realistic event log. Another novelty of this study is developed evaluation metrics, which are used for finding immediate successors in order to convert these raw relations into dependency/frequency graph.
Discovering more accurate frequent web usage patterns
Bayır, Murat Ali; Toroslu, İsmail Hakkı; Coşar, Ahmet; Fidan, Güven (2008-09-01)
Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web users. As in classical data mining, data preparation and pattern discovery are the main issues in web usage mining. The first phase of web usage mining is the data processing phase, which includes the session reconstruction operation from server logs. Session reconstruction success directly affects the quality of the frequent patterns discovered in the n...
Development of a dynamic maintenance algorithm with multiple scenarios: a case study for surface mining
Ölmez Turan, Merv; Gölbaşı, Onur; Department of Mining Engineering (2019)
Surface mining operations such as ore extraction and overburden stripping activities highly depend on machine performance. These machines’ operational plan aims to handle required amount of material within a specific period with the lowest maintenance cost and the highest availability. In order to achieve these objectives, the machines should be adapted to the production schedule properly. On this basis, maintenance policies play crucial roles in the sustainability of operations. A maintenance policy is bas...
Improving the scalability of ILP-based multi-relational concept discovery system through parallelization
Mutlu, Ayşe Ceyda; Karagöz, Pınar; Kavurucu, Yusuf (2012-03-01)
Due to the increase in the amount of relational data that is being collected and the limitations of propositional problem definition in relational domains, multi-relational data mining has arisen to be able to extract patterns from relational data. In order to cope with intractably large search space and still to be able to generate high-quality patterns. ILP-based multi-relational data mining and concept discovery systems employ several search strategies and pattern limitations. Another direction to cope w...
Citation Formats
K. D. Onal and P. Karagöz, “Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle,” 2014, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/34430.