Improving Hadoop Hive Query Response Times Through Efficient Virtual Resource Allocation

2015-10-28
Dokeroglu, Tansel
Cinar, Muhammet Serkan
SERT, SEYYİT ALPER
Coşar, Ahmet
Yazıcı, Adnan
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware resources allocated. Most of the time, the resources are values selected/given by the Cloud service providers. However, setting the right virtual resources in accordance with the workload demands of a query, such as the number of CPUs, the size of RAM, and the network bandwidth, will improve the response time when querying large data on an optimized system. In this study, we carried out a set of experiments with a well-known Mapreduce SQL-translator, Hadoop Hive, on benchmark decision support the TPC benchmark (TPC-H) database in order to analyze the performance sensitivity of the queries under different virtual resource settings. Our results provide valuable hints for the decision makers who design efficient MapReduce-based data warehouses on the Cloud.

Suggestions

Optimal dynamic resource allocation for heterogenous cloud data centers
Ekici, Nazım Umut; Güran Schmidt, Şenan.; Department of Electrical and Electronics Engineering (2019)
Today's data centers are mostly cloud-based with virtualized servers to provide on-demand scalability and flexibility of the available resources such as CPU, memory, data storage and network bandwidth. Heterogeneous cloud data centers (CDCs) offer hardware accelerators in addition to these standard cloud server resources. A cloud data center provider may provide Infrastructure as a Service and Platform as a Service (IPaaS), where the user gets a virtual machine (VM) with processing, memory, storage and netw...
Improving data freshness in random access channels
Atabay, Doğa Can; Uysal, Elif; Department of Electrical and Electronics Engineering (2019)
The conventional network performance metrics such as throughput and delay do not accurately reflect the needs of some applications. Age of information (AoI) is a newly proposed metric that indicates the freshness of information from the receiver’s perspective. In this work, a network of multiple transmitter devices continuously updating a central station over an error-free multiaccess channel is studied. The average AoI expressions are derived for Round-Robin, Slotted ALOHA, and a proposed random access str...
Generalized resource management for heterogeneous cloud data centers
Erol, Ahmet; Güran Schmidt, Şenan Ece.; Department of Electrical and Electronics Engineering (2019)
OpenStack is a widely used management tool for cloud computing which is designed to work on servers and allocate standard computing resources such as CPU, memory or disk. The current trend for integrating different hardware accelerators such as FPGAs and GPUs in the cloud requires managing these heterogeneous resources. In this thesis, we propose a generalization for OpenStack Nova project which extends the relevant data structures to include these new resources. More importantly, we present a new lightweig...
Design-objective space exploration and multi-objective optimization of initial structural design alternatives via machine learning
Yetkin, Ozan; Sorguç, Arzu; Department of Architecture (2020-9)
Increasing implementations of digital workflows within design processes generate exponentially growing data in each phase. Therefore, decision making within a design space with growing complexity is expected to be a great challenge for designers in the future. Hence, this research aimed to seek the potentials of complex relations between data within design space and objective space of structural design problems for proposing a novel approach to augment capabilities of digital tools by artificial intelligenc...
Improving the performance of Hadoop/Hive by sharing scan and computation tasks
Özal, Serkan; Toroslu, İsmail Hakkı; Doğaç, Asuman; Department of Computer Engineering (2013)
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse sy...
Citation Formats
T. Dokeroglu, M. S. Cinar, S. A. SERT, A. Coşar, and A. Yazıcı, “Improving Hadoop Hive Query Response Times Through Efficient Virtual Resource Allocation,” 2015, vol. 400, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/31138.