Multiobjective relational data warehouse design for the cloud

Download
2014
Dökeroğlu, Tansel
Conventional distributed DataWarehouse (DW) design techniques seek to assign data tables/fragments to a given static database hardware setting optimally. However; it is now possible to use elastic virtual resources provided by the Cloud environment, thus achieve reductions in both the execution time and the monetary cost of a DW system within predefined budget and response time constraints. Finding an optimal assignment plan for database tables to machines for this design problem is NP-Hard. Therefore, robust multiobjective heuristic algorithms are needed for cost-efficient Cloud DWs in terms of query workload response time and the total ownership price of virtual resources (CPU and/or cores, RAM, hard disk storage, network bandwidth, and disk I/O bandwidth). In this thesis we propose two algorithms for the solution of the relational Cloud DW design problem; (1) Multiobjective Design with Branch and Bound (MOD-B&B) and (2) Multiobjective Evolutionary Genetic Algorithm (MOD-GA). These algorithms make use of a novel Cloud DW single query optimizer, DPACO, that can find the best distributed query execution plan and accurately calculate its response time. By using DPACO on an input query workload we find the best query execution plans for given query workloads using the given virtual resource allocations. The best allocation of virtual resources for a DW design is achieved by using MOD-GA. We developed a special chromosome structure, along with crossover and mutation operators, to achieve the best results from MOD-GA.We experimentally verified the accuracy of the algorithm by comparing its output designs against the optimal designs obtained by using an exhaustive MOD-B&B algorithm. Our evaluations show that the obtained designs are very close to the optimal solution set and while MOD-B&B algorithm requires hours to complete its execution, the MOD-GA is able to return almost the same results within seconds. In order to achieve further improvement in total response time of a query workload with monetary savings from Cloud resources, we improved the Cloud DW designs by using (near-) optimal and cost-effi cient materialized views. Through our experiments performed on a private Cloud server, remarkable improvements in both response times of query workloads and monetary costs of consumed Cloud resources have been achieved. The reason for these savings is that, by materializing join results on hard disk, we obtain large CPU resource savings reducing Cloud cost, off setting the cost of extra hard disk storage by a wide margin.

Suggestions

Data sharing and access with a corba data distribution service implementation
Dursun, Mustafa; Bilgen, Semih; Department of Electrical and Electronics Engineering (2006)
Data Distribution Service (DDS) specification defines an API for Data-Centric Publish-Subscribe (DCPS) model to achieve efficient data distribution in distributed computing environments. Lack of definition of interoperability architecture in DDS specification obstructs data distribution between different and heterogeneous DDS implementations. In this thesis, DDS is implemented as a CORBA service to achieve interoperability and a QoS policy is proposed for faster data distribution with CORBA features.
Semantic concept recognition from structured and unstructured inputs within cyber security domain
Hoşsucu, Alp Gökhan; Baykal, Nazife; Department of Information Systems (2015)
Linked data initiative has been quite successful in terms of publishing and interlinking data over ontological structures. The success is due to answering semantically rich queries over highly structured data. The utilization of linked data structures are widely used in various domains to solve the problem of producing domain specific knowledge which can be interpreted by automated agents without any human interference. Cyber security field is one of the domains that suffer from the excessiveness of the raw...
Multimodal query-level fusion for efficient multimedia information retrieval
Sattari, Saeid; Yazıcı, Adnan (2018-10-01)
Managing a large volume of multimedia data containing various modalities such as visual, audio, and text reveals the necessity for efficient methods for modeling, processing, storing, and retrieving complex data. In this paper, we propose a fusion-based approach at the query level to improve query retrieval performance of multimedia data. We discuss various flexible query types including the combination of content as well as concept-based queries that provide users with the ability to efficiently perform mu...
CLOUDGEN: Workload generation for the evaluation of cloud computing systems CLOUDGEN: Bulut Bilişim Sistemlerinin Başarim Deǧerlendirmesi icin Iş Yuku Uretimi
Koltuk, Furkan; Yazar, Alper; Schmidt, Şenan Ece (2019-04-01)
In this paper, we propose CLOUDGEN workflow that produces synthetic workloads for Infrastructure and Platform as a Service for the evaluation of resource management approaches in cloud computing systems. To this end, CLOUDGEN systematically processes and clusters records in a given workload trace and fits distributions for different workload parameters within the clusters. Different than the previous work, clustering is carried out to produce different virtual machine types for achieving models that are sui...
Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries
Dokeroglu, Tansel; Bayir, Murat Ali; Coşar, Ahmet (2015-05-01)
Cloud computing enables a conventional relational database system's hardware to be adjusted dynamically according to query workload, performance and deadline constraints. One can rent a large amount of resources for a short duration in order to run complex queries efficiently on large-scale data with virtual machine clusters. Complex queries usually contain common subexpressions, either in a single query or among multiple queries that are submitted as a batch. The common subexpressions scan the same relatio...
Citation Formats
T. Dökeroğlu, “Multiobjective relational data warehouse design for the cloud,” Ph.D. - Doctoral Program, Middle East Technical University, 2014.