Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle

Onal, Kezban Dilek
Karagöz, Pınar
Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from the literature and FOF (First Occurrence Forest) strategy. Experimental results revealed that FOF-SP finds patterns faster than previous WAP-Tree based algorithms PLWAP and FOF. Moreover, FOF-SP can mine patterns faster than PrefixSpan and as fast as LAPIN on real sequence databases from web usage mining and bioinformatics.


A New WAP-tree based sequential pattern mining algorithm for faster pattern extraction
Önal, Kezban Dilek; Şenkul, Pınar; Department of Computer Engineering (2012)
Sequential pattern mining constitutes a basis for solution of problems in various domains like bio-informatics and web usage mining. Research on this field continues seeking faster algorithms. WAP-Tree based algorithms that emerged from web usage mining literature have shown a remarkable performance on single-item sequence databases. In this study, we investigated application of WAP-Tree based mining to multi-item sequential pattern mining and we designed an extension of WAP-Tree data structure for multi-it...
Discovering more accurate frequent web usage patterns
Bayır, Murat Ali; Toroslu, İsmail Hakkı; Coşar, Ahmet; Fidan, Güven (2008-09-01)
Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web users. As in classical data mining, data preparation and pattern discovery are the main issues in web usage mining. The first phase of web usage mining is the data processing phase, which includes the session reconstruction operation from server logs. Session reconstruction success directly affects the quality of the frequent patterns discovered in the n...
A Hybrid Approach to Process Mining: Finding Immediate Successors of a Process by Using From-To Chart
Esgin, Eren; Karagöz, Pınar (2009-12-15)
Process mining is a branch of data mining that aims to discover process model from the event logs. In this study, we propose a hybrid approach to process mining in such a way that, "from-to chart" is used as the front-end to monitor the transitions among activities of a realistic event log. Another novelty of this study is developed evaluation metrics, which are used for finding immediate successors in order to convert these raw relations into dependency/frequency graph.
Development of a dynamic maintenance algorithm with multiple scenarios: a case study for surface mining
Ölmez Turan, Merv; Gölbaşı, Onur; Department of Mining Engineering (2019)
Surface mining operations such as ore extraction and overburden stripping activities highly depend on machine performance. These machines’ operational plan aims to handle required amount of material within a specific period with the lowest maintenance cost and the highest availability. In order to achieve these objectives, the machines should be adapted to the production schedule properly. On this basis, maintenance policies play crucial roles in the sustainability of operations. A maintenance policy is bas...
Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard Wilhelm; Toledano-Kitai, Dvora (2009-06-03)
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stabil...
