BB-PLUS: an efficient approach for subgraph isomorphism problem in big graph databases

Download
2019
Taşkomaz, Ezgi
Graph databases are flexible NoSQL databases used to efficiently store and query complex dataset. The problem of subgraph isomorphism, finding a pattern in a given graph, is one of the biggest problem of graph databases. Therefore, the goal of this study is to introduce a new approach called BB-Plus, which consists of heuristics to find best matching order using the volatility and size of the database, the type and size of the query as an input in order to improve the performance of the queries. BBPlus approach trims candidate nodes at high level and effectively reduces the size of the problem. The approach is implemented using the Java programming language and graph data structures of Neo4j GDBMS and compared to the state-of-the-art subgraph isomorphism algorithms, namely BB-Graph, Cypher, DualIso, GraphQL, TurboIso and VF3 with three different dataset within the same programming environment. The results of the performance tests show that BB-Plus is an average on 10%, 37% and 4% faster than the other algorithms based on different queries in public WorldCup, Pokec and non-public Population dataset, respectively.

Suggestions

BB-graph: a new subgraph isomorphism algorithm for querying big graph databases
Asiler, Merve; Yazıcı, Adnan; Department of Computer Engineering (2016)
With the emergence of the big data concept, the big graph database model has become very popular since it provides very flexible and quick querying for the cases that require costly join operations in RDBMs. However, it is a big challenge to find all exact matches of a query graph in a big database graph, which is known as the subgraph isomorphism problem. Although many related studies exist in literature, there is not a perfect algorithm that works for all types of queries efficiently since it is an NP-har...
Multiobjective relational data warehouse design for the cloud
Dökeroğlu, Tansel; Coşar, Ahmet; Department of Computer Engineering (2014)
Conventional distributed DataWarehouse (DW) design techniques seek to assign data tables/fragments to a given static database hardware setting optimally. However; it is now possible to use elastic virtual resources provided by the Cloud environment, thus achieve reductions in both the execution time and the monetary cost of a DW system within predefined budget and response time constraints. Finding an optimal assignment plan for database tables to machines for this design problem is NP-Hard. Therefore, robu...
Data sharing and access with a corba data distribution service implementation
Dursun, Mustafa; Bilgen, Semih; Department of Electrical and Electronics Engineering (2006)
Data Distribution Service (DDS) specification defines an API for Data-Centric Publish-Subscribe (DCPS) model to achieve efficient data distribution in distributed computing environments. Lack of definition of interoperability architecture in DDS specification obstructs data distribution between different and heterogeneous DDS implementations. In this thesis, DDS is implemented as a CORBA service to achieve interoperability and a QoS policy is proposed for faster data distribution with CORBA features.
A new hybrid multi-relational data mining technique
Toprak, Seda Dağlar; Toroslu, İ. Hakkı; Department of Computer Engineering (2005)
Multi-relational learning has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. As patterns involve multiple relations, the search space of possible hypotheses becomes intractably complex. Many relational knowledge discovery systems have been developed employing various search strategies, search heuristics and pattern language limitations in order to cope with the complexity of hypothesis space. In this w...
Improving the performance of Hadoop/Hive by sharing scan and computation tasks
Özal, Serkan; Toroslu, İsmail Hakkı; Doğaç, Asuman; Department of Computer Engineering (2013)
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse sy...
Citation Formats
E. Taşkomaz, “BB-PLUS: an efficient approach for subgraph isomorphism problem in big graph databases,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Computer Engineering., Middle East Technical University, 2019.