Parallel closet+ algorithm for finding frequent closed itemsets

Download
2009
Şen, Tayfun
Data mining is proving itself to be a very important fi eld as the data available is increasing exponentially, thanks to first computerization and now internetization. On the other hand, cluster computing systems made up of commodity hardware are becoming widespread, along with the multicore processor architectures. This high computing power is synthesized with data mining to process huge amounts of data and to reach information and knowledge. Frequent itemset mining is a special subtopic of data mining because it is an integral part of many types of data mining tasks. Often this task is a prerequisite for many other data mining algorithms, most notably algorithms in the association rule mining area. For this reason, it is studied heavily in the literature. In this thesis, a parallel implementation of CLOSET+, a frequent closed itemset mining algorithm, is presented. The CLOSET+ algorithm has been modi fied to run on multiple processors simultaneously, in order to obtain results faster. Open MPI and Boost libraries have been used for the communication between different processes and the program has been tested on different inputs and parameters. Experimental results show that the algorithm exhibits high speedup and e ficiency for dense data when the support value is higher than a determined value. Proposed parallel algorithm could prove to be useful for application areas where fast response is needed for low to medium number of frequent closed itemsets. A particular application area is the Web where online applications have similar requirements.

Suggestions

Data mining on architecture simulation
Maden, Engin; Karagöz, Pınar; Department of Computer Engineering (2010)
Data mining is the process of extracting patterns from huge data. One of the branches in data mining is mining sequence data and here the data can be viewed as a sequence of events and each event has an associated time of occurrence. Sequence data is modelled using episodes and events are included in episodes. The aim of this thesis work is analysing architecture simulation output data by applying episode mining techniques, showing the previously known relationships between the events in architecture and pr...
Efficient index structures for video databases
Açar, Esra; Yazıcı, Adnan; Department of Computer Engineering (2008)
Content-based retrieval of multimedia data has been still an active research area. The efficient retrieval of video data is proven a difficult task for content-based video retrieval systems. In this thesis study, a Content-Based Video Retrieval (CBVR) system that adapts two different index structures, namely Slim-Tree and BitMatrix, for efficiently retrieving videos based on low-level features such as color, texture, shape and motion is presented. The system represents low-level features of video data with ...
Fuzzy association rule mining from spatio-temporal data: an analysis of meteorological data in turkey
Ünal Çalargün, Seda; Yazıcı, Adnan; Department of Computer Engineering (2008)
Data mining is the extraction of interesting non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases. Association rule mining is a data mining method that seeks to discover associations among transactions encoded within a database. Data mining on spatio-temporal data takes into consideration the dynamics of spatially extended systems for which large amounts of spatial data exist, given that all real world spatial data exists in some temporal cont...
Using semantic web services for data integration in banking domain
Okat, Çağlar; Doğru, Ali Hikmet; Department of Computer Engineering (2010)
A semantic model oriented transformation mechanism is developed for the centralization of intra-enterprise data integration. Such a mechanism is especially crucial in the banking domain which is selected in this study. A new domain ontology is constructed to provide basis for annotations. A bottom-up approach is preferred for semantic annotations to utilize existing web service definitions. Transformations between syntactic web service XML responses and semantic model concepts are defined in transformation ...
A singular value decomposition approach for recommendation systems
Osmanlı, Osman Nuri; Toroslu, İsmail Hakkı; Department of Computer Engineering (2010)
Data analysis has become a very important area for both companies and researchers as a consequence of the technological developments in recent years. Companies are trying to increase their profit by analyzing the existing data about their customers and making decisions for the future according to the results of these analyses. Parallel to the need of companies, researchers are investigating different methodologies to analyze data more accurately with high performance. Recommender systems are one of the most...
Citation Formats
T. Şen, “Parallel closet+ algorithm for finding frequent closed itemsets,” M.S. - Master of Science, Middle East Technical University, 2009.