HyGraph: a subgraph isomorphism algorithm for efficiently querying big graph databases

The big graph database provides strong modeling capabilities and efficient querying for complex applications. Subgraph isomorphism which finds exact matches of a query graph in the database efficiently, is a challenging problem. Current subgraph isomorphism approaches mostly are based on the pruning strategy proposed by Ullmann. These techniques have two significant drawbacks- first, they are unable to efficiently handle complex queries, and second, their implementations need the large indexes that require large memory resources. In this paper, we describe a new subgraph isomorphism approach, the HyGraph algorithm, that is efficient both in querying and with memory requirements for index creation. We compare the HyGraph algorithm with two popular existing approaches, GraphQL and Cypher using complexity measures and experimentally using three big graph data sets-(1) a country-level population database, (2) a simulated bank database, and (3) a publicly available World Cup big graph database. It is shown that the HyGraph solution performs significantly better (or equally) than competing algorithms for the query operations on these big databases, making it an excellent candidate for subgraph isomorphism queries in real scenarios.


BB-graph: a new subgraph isomorphism algorithm for querying big graph databases
Asiler, Merve; Yazıcı, Adnan; Department of Computer Engineering (2016)
With the emergence of the big data concept, the big graph database model has become very popular since it provides very flexible and quick querying for the cases that require costly join operations in RDBMs. However, it is a big challenge to find all exact matches of a query graph in a big database graph, which is known as the subgraph isomorphism problem. Although many related studies exist in literature, there is not a perfect algorithm that works for all types of queries efficiently since it is an NP-har...
Improving the performance of Hadoop/Hive by sharing scan and computation tasks
Özal, Serkan; Toroslu, İsmail Hakkı; Doğaç, Asuman; Department of Computer Engineering (2013)
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse sy...
Fuzzy data representation and querying in XML database
Ustunkaya, Ekin; Yazıcı, Adnan; George, Roy (2007-02-01)
Real-world information including subjective opinions and judgments need imprecise data to be modeled for representation and querying in databases. The Extensible Markup Language (XML) has become a de-facto standard for data modeling and exchange in recent years. Efforts on modeling imprecision and representing such data in XML have not been fully developed. In this paper, an XML based fuzzy data representation and querying system is presented. Complex and imprecise data are represented using a fuzzy extensi...
Using fuzzy Petri nets for static analysis of rule-bases
Bostan-Korpeoglu, B; Yazıcı, Adnan (2004-01-01)
We use a Fuzzy Petri Net (FPN) structure to represent knowledge and model the behavior in our intelligent object-oriented database environment, which integrates fuzzy, active and deductive rules with database objects. However, the behavior of a system can be unpredictable due to the rules triggering or untriggering each other (non-termination). Intermediate and final database states may also differ according to the order of rule executions (non-confluence). In order to foresee and solve problematic behavior...
Data integration over horizontally partitioned databases in service-oriented data grids
Sunercan, Hatice Kevser Sönmez; Çiçekli, Fehime Nihan; Alpdemir, Mahmut Nedim; Department of Computer Engineering (2010)
Information integration over distributed and heterogeneous resources has been challenging in many terms: coping with various kinds of heterogeneity including data model, platform, access interfaces; coping with various forms of data distribution and maintenance policies, scalability, performance, security and trust, reliability and resilience, legal issues etc. It is obvious that each of these dimensions deserves a separate thread of research efforts. One particular challenge among the ones listed above tha...
