A distributed graph mining framework based on mapreduce

Download
2010
Alkan, Sertan
The frequent patterns hidden in a graph can reveal crucial information about the network the graph represents. Existing techniques to mine the frequent subgraphs in a graph database generally rely on the premise that the data can be fit into main memory of the device that the computation takes place. Even though there are some algorithms that are designed using highly optimized methods to some extent, many lack the solution to the problem of scalability. In this thesis work, our aim is to find and enumerate the subgraphs that are at least as frequent as the designated threshold in a given graph. Here, we propose a new distributed algorithm for frequent subgraph mining problem that can scale horizontally as the computing cluster size increases. The method described here, uses a partitioning method and Map/Reduce programming model to distribute the computation of frequent subgraphs. In the core of this algorithm, we make use of an existing graph partitioning method to split the given data in the distributed file system and to merge and join the computed subgraphs without losing information. The frequent subgraph computation in each split is done using another known method which can enumerate the frequent patterns. Although current algorithms can efficiently find frequent patterns, they are not parallel or distributed algorithms in that even when they partition the data, they are designed to work on a single machine. Furthermore, these algorithms are computationally expensive but not fault tolerant and are not designed to work on a distributed file system. Using the Map/Reduce paradigm, we distribute the computation of frequent patterns to every machine in a cluster. Our algorithm, first bi-partitions the data via successive Map/Reduce jobs, then invokes another Map/Reduce job to compute the subgraphs in partitions using CloseGraph, recovers the whole set by invoking a series of Map/Reduce jobs to merge-join the previously found patterns. The implementation uses an open source Map/Reduce environment, Hadoop. In our experiments, our method can scale up to large graphs, as the graph data size gets bigger, this method performs better than the existing algorithms.

Suggestions

A MAP-Based Approach for Hyperspectral Imagery Super-Resolution
IRMAK, Hasan; Akar, Gözde; Yuksel, Seniha Esen (Institute of Electrical and Electronics Engineers (IEEE), 2018-06-01)
In this paper, we propose a novel single image Bayesian super-resolution (SR) algorithm where the hyperspectral image (HSI) is the only source of information. The main contribution of the proposed approach is to convert the ill-posed SR reconstruction problem in the spectral domain to a quadratic optimization problem in the abundance map domain. In order to do so, Markov random field based energy minimization approach is proposed and proved that the solution is quadratic. The proposed approach consists of f...
A hybrid recommendation system capturing the effect of time and demographic data
Oktay, Fulya; Alpaslan, Ferda Nur; Department of Computer Engineering (2010)
The information that World Wide Web (WWW) provides have grown up very rapidly in recent years, which resulted in new approaches for people to reach the information they need. Although web pages and search engines are indeed strong enough for us to reach what we want, it is not an efficient solution to present data and wait people to reach it. Some more creative and beneficial methods had to be developed for decreasing the time to reach the information and increase the quality of the information. Recommendat...
A Layout algorithm for visualization of graph alignments
Akarsu, Andaç; Can, Tolga; Department of Computer Engineering (2017)
Graph layout algorithms are commonly used when visualizing. Usually these algorithms focus on a single graph. To be able to visualize multiple graphs at once, such as the results of graph alignment algorithms on biological networks, new layout algorithms need to be developed. A layout algorithm for visualizing graph alignments should display the aligned graphs separately, so that both the graphs and their alignment can be viewed individually. In addition, for better interpretation of the alignment results, ...
A matching algorithm based on linear features
Atalay, Mehmet Volkan (Elsevier BV, 1998-07-01)
A two step feature matching algorithm which is primarily aimed at problems related to the analysis of aerial images of man-made sites is presented. Only linear features and their geometric attributes are used in the algorithm. First, the rotation between the two images is calculated and then matching by relaxation is performed assuming that there is only translation.
A systematic study of probabilistic aggregation strategies in swarm robotic systems
Soysal, Onur; Şahin, Erol; Department of Computer Engineering (2005)
In this study, a systematic analysis of probabilistic aggregation strategies in swarm robotic systems is presented. A generic aggregation behavior is proposed as a combination of four basic behaviors: obstacle avoidance, approach, repel, and wait. The latter three basic behaviors are combined using a three-state finite state machine with two probabilistic transitions among them. Two different metrics were used to compare performance of strategies. Through systematic experiments, how the aggregation performa...
Citation Formats
S. Alkan, “A distributed graph mining framework based on mapreduce,” M.S. - Master of Science, Middle East Technical University, 2010.