Multiobjective relational data warehouse design for the cloud

Download
2014
Dökeroğlu, Tansel
Conventional distributed DataWarehouse (DW) design techniques seek to assign data tables/fragments to a given static database hardware setting optimally. However; it is now possible to use elastic virtual resources provided by the Cloud environment, thus achieve reductions in both the execution time and the monetary cost of a DW system within predefined budget and response time constraints. Finding an optimal assignment plan for database tables to machines for this design problem is NP-Hard. Therefore, robust multiobjective heuristic algorithms are needed for cost-efficient Cloud DWs in terms of query workload response time and the total ownership price of virtual resources (CPU and/or cores, RAM, hard disk storage, network bandwidth, and disk I/O bandwidth). In this thesis we propose two algorithms for the solution of the relational Cloud DW design problem; (1) Multiobjective Design with Branch and Bound (MOD-B&B) and (2) Multiobjective Evolutionary Genetic Algorithm (MOD-GA). These algorithms make use of a novel Cloud DW single query optimizer, DPACO, that can find the best distributed query execution plan and accurately calculate its response time. By using DPACO on an input query workload we find the best query execution plans for given query workloads using the given virtual resource allocations. The best allocation of virtual resources for a DW design is achieved by using MOD-GA. We developed a special chromosome structure, along with crossover and mutation operators, to achieve the best results from MOD-GA.We experimentally verified the accuracy of the algorithm by comparing its output designs against the optimal designs obtained by using an exhaustive MOD-B&B algorithm. Our evaluations show that the obtained designs are very close to the optimal solution set and while MOD-B&B algorithm requires hours to complete its execution, the MOD-GA is able to return almost the same results within seconds. In order to achieve further improvement in total response time of a query workload with monetary savings from Cloud resources, we improved the Cloud DW designs by using (near-) optimal and cost-effi cient materialized views. Through our experiments performed on a private Cloud server, remarkable improvements in both response times of query workloads and monetary costs of consumed Cloud resources have been achieved. The reason for these savings is that, by materializing join results on hard disk, we obtain large CPU resource savings reducing Cloud cost, off setting the cost of extra hard disk storage by a wide margin.

Suggestions

Data sharing and access with a corba data distribution service implementation
Dursun, Mustafa; Bilgen, Semih; Department of Electrical and Electronics Engineering (2006)
Data Distribution Service (DDS) specification defines an API for Data-Centric Publish-Subscribe (DCPS) model to achieve efficient data distribution in distributed computing environments. Lack of definition of interoperability architecture in DDS specification obstructs data distribution between different and heterogeneous DDS implementations. In this thesis, DDS is implemented as a CORBA service to achieve interoperability and a QoS policy is proposed for faster data distribution with CORBA features.
Multimodal query-level fusion for efficient multimedia information retrieval
Sattari, Saeid; Yazıcı, Adnan (2018-10-01)
Managing a large volume of multimedia data containing various modalities such as visual, audio, and text reveals the necessity for efficient methods for modeling, processing, storing, and retrieving complex data. In this paper, we propose a fusion-based approach at the query level to improve query retrieval performance of multimedia data. We discuss various flexible query types including the combination of content as well as concept-based queries that provide users with the ability to efficiently perform mu...
CLOUDGEN: Workload generation for the evaluation of cloud computing systems CLOUDGEN: Bulut Bilişim Sistemlerinin Başarim Deǧerlendirmesi icin Iş Yuku Uretimi
Koltuk, Furkan; Yazar, Alper; Schmidt, Şenan Ece (2019-04-01)
In this paper, we propose CLOUDGEN workflow that produces synthetic workloads for Infrastructure and Platform as a Service for the evaluation of resource management approaches in cloud computing systems. To this end, CLOUDGEN systematically processes and clusters records in a given workload trace and fits distributions for different workload parameters within the clusters. Different than the previous work, clustering is carried out to produce different virtual machine types for achieving models that are sui...
Handling complex and uncertain information in the ExIFO and NF2 data models
Yazıcı, Adnan; Petry, FE (1999-12-01)
Trends in databases leading to complex objects present opportunities for representing imprecision and uncertainty that were difficult to integrate cohesively in simpler database models. In fact, one can begin at the conceptual level with a model that allows uncertainty assumptions and then transform those assumptions into a logical model having the necessary semantic foundations upon which to base a meaningful query language. Here we provide such a constructive approach beginning with the ExIFO model for ex...
Multiple description coding of animated meshes
Bici, M. Oguz; Akar, Gözde (Elsevier BV, 2010-11-01)
In this paper, we propose three novel multiple description coding (MDC) methods for reliable transmission of compressed animated meshes represented by series of 3D static meshes with same connectivity. The proposed methods trade off reconstruction quality for error resilience to provide the best expected reconstruction of 3D mesh sequence at the decoder side. The methods are based on layer duplication and partitioning of the set of vertices of a scalable coded animated mesh by either spatial or temporal sub...
Citation Formats
T. Dökeroğlu, “Multiobjective relational data warehouse design for the cloud,” Ph.D. - Doctoral Program, Middle East Technical University, 2014.