Hide/Show Apps

Multiobjective relational data warehouse design for the cloud

Dökeroğlu, Tansel
Conventional distributed DataWarehouse (DW) design techniques seek to assign data tables/fragments to a given static database hardware setting optimally. However; it is now possible to use elastic virtual resources provided by the Cloud environment, thus achieve reductions in both the execution time and the monetary cost of a DW system within predefined budget and response time constraints. Finding an optimal assignment plan for database tables to machines for this design problem is NP-Hard. Therefore, robust multiobjective heuristic algorithms are needed for cost-efficient Cloud DWs in terms of query workload response time and the total ownership price of virtual resources (CPU and/or cores, RAM, hard disk storage, network bandwidth, and disk I/O bandwidth). In this thesis we propose two algorithms for the solution of the relational Cloud DW design problem; (1) Multiobjective Design with Branch and Bound (MOD-B&B) and (2) Multiobjective Evolutionary Genetic Algorithm (MOD-GA). These algorithms make use of a novel Cloud DW single query optimizer, DPACO, that can find the best distributed query execution plan and accurately calculate its response time. By using DPACO on an input query workload we find the best query execution plans for given query workloads using the given virtual resource allocations. The best allocation of virtual resources for a DW design is achieved by using MOD-GA. We developed a special chromosome structure, along with crossover and mutation operators, to achieve the best results from MOD-GA.We experimentally verified the accuracy of the algorithm by comparing its output designs against the optimal designs obtained by using an exhaustive MOD-B&B algorithm. Our evaluations show that the obtained designs are very close to the optimal solution set and while MOD-B&B algorithm requires hours to complete its execution, the MOD-GA is able to return almost the same results within seconds. In order to achieve further improvement in total response time of a query workload with monetary savings from Cloud resources, we improved the Cloud DW designs by using (near-) optimal and cost-effi cient materialized views. Through our experiments performed on a private Cloud server, remarkable improvements in both response times of query workloads and monetary costs of consumed Cloud resources have been achieved. The reason for these savings is that, by materializing join results on hard disk, we obtain large CPU resource savings reducing Cloud cost, off setting the cost of extra hard disk storage by a wide margin.