Hide/Show Apps

Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle

2014-06-04
Onal, Kezban Dilek
Karagöz, Pınar
Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from the literature and FOF (First Occurrence Forest) strategy. Experimental results revealed that FOF-SP finds patterns faster than previous WAP-Tree based algorithms PLWAP and FOF. Moreover, FOF-SP can mine patterns faster than PrefixSpan and as fast as LAPIN on real sequence databases from web usage mining and bioinformatics.