Distributed database design with integer linear programming and evolutionary hybrid algorithms

Download
2013
Tosun, Umut
The communication costs of remote access and retrieval of table fragments required in the execution of distributed database queries, are the major factors determining the quality of a distributed database design. Data allocation algorithms try to minimize these costs by dividing database tables into horizontal fragments, then assigning each fragment at or near the database sites they are needed more frequently. In this thesis, we propose efficient optimization algorithms for centralized and distributed databases based on integer linear programming and the well known Quadratic Assignment Problem (QAP). In the context of distributed database design, we have formulated an optimization problem where site capacities, table replicas, communication link capacities and table fragmentation are handled. Integer linear programming is a powerful tool to represent all of these optimization parameters easily while existing models in the literature ignore one or more of these important and practical aspects. The proposed model is based on integer linear programming and for the first time it handles all of these criteria in a consistent formulation. The QAP is a difficult and important problem studied in the domain of combinatorial optimization. It is possible to solve QAP instances with 10-20 facilities using exhaustive parallel algorithms within a few days on a cluster machine. We solve large QAP instances using a variety of genetic algorithm crossover operators for these problems and experimentally verified their performance using benchmark QAP instances from QAPLIB library. We propose and experimentally evaluate the island parallel QAP-IPGA algorithm. Then, we propose a new Parallel Hybrid Genetic Tabu Search based algorithm (PHGETS) which has an intelligent diversification phase executed on a seed solution obtained using the genetic algorithm. PHGETS consistently achieves results on average within 0.05 % of the best solutions given in QAPLIB for all problem instances that were considered.

Suggestions

Improving the performance of Hadoop/Hive by sharing scan and computation tasks
Özal, Serkan; Toroslu, İsmail Hakkı; Doğaç, Asuman; Department of Computer Engineering (2013)
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse sy...
Integer Linear Programming Solution for the Multiple Query Optimization Problem
Dokeroglu, Tansel; Bayir, Murat Ali; Coşar, Ahmet (2014-10-28)
Multiple Query Optimization (MQO) is a technique for processing a batch of queries in such a way that shared tasks in these queries are executed only once, resulting in significant savings in the total evaluation. The first phase of MQO requires producing alternative query execution plans so that the shared tasks between queries are identified and maximized. The second phase of MQO is an optimization problem where the goal is selecting exactly one of the alternative plans for each query to minimize the tota...
Uncertainty in a nested relational database model
Yazıcı, Adnan; Buckles, BP; Petry, FE (1999-07-01)
Some database models have already been developed to deal with complex values but they have constrains that data stored is precise and queries are crisp. However, as many researchers have pointed out, there is a need to present, manipulate, and query complex and uncertain data of various non-traditional database applications such as oceanography, multimedia, meteorology, office automation systems, engineering designs, expert database systems and geographic information systems. In this paper, we present a log...
Exploiting Navigational Queries for Result Presentation and Caching in Web Search Engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Ulusoy, Ozgur (Wiley, 2011-04-01)
Caching of query results is an important mechanism for efficiency and scalability of web search engines. Query results are cached and presented in terms of pages, which typically include 10 results each. In navigational queries, users seek a particular website, which would be typically listed at the top ranks (maybe, first or second) by the search engine, if found. For this type of query, caching and presenting results in the 10-per-page manner may waste cache space and network bandwidth. In this article, w...
Flexible querying in an intelligent object-oriented database environment
Koyuncu, M; Yazıcı, Adnan; George, R (2000-10-28)
Many new-generation database applications demand intelligent information management necessitating efficient interactions between database gr. knowledge bases and the users. In this study we discuss evaluation of imprecise queries in an intelligent object-oriented database environment, IFOOD. A flexible query evaluation mechanism, capable of handling different data types including complex and imprecise data and knowledge is presented and key language issues are addressed.
Citation Formats
U. Tosun, “Distributed database design with integer linear programming and evolutionary hybrid algorithms,” Ph.D. - Doctoral Program, Middle East Technical University, 2013.