Improving the efficiency of distributed information retrieval using hybrid index partitioning

Download
2018
Hafızoğlu, Fatih
Selective search with traditional partitioning have advantages over exhaustive search in terms of total query cost. However, it can suffer from query latency and load imbalance for most of the time due to its nature. To overcome these issues, we proposed a new partitioning method in this thesis, namely Hybrid partitioning. Our studies shows that it is possible to obtain significant savings in query latency with this new partitioning methodology. In addition to that, query processing with Hybrid partitioning also achieves perfect load balancing and provides resource optimization, which is a key point for low resource environments.

Suggestions

Improving the performance of Hadoop/Hive by sharing scan and computation tasks
Özal, Serkan; Toroslu, İsmail Hakkı; Doğaç, Asuman; Department of Computer Engineering (2013)
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse sy...
A Cost-Aware Strategy for Query Result Caching in Web Search Engines
Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2009-01-01)
Search engines and large scale IR systems need to cache query results for efficiency and scalability purposes. In this study, we propose to explicitly incorporate the query costs in the static caching policy. To this end, a query’s cost is represented by its execution time, which involves CPU time to decompress the postings and compute the query-document similarities to obtain the final top-N answers. Simulation results using a large Web crawl data and a real query log reveal that the proposed strategy impr...
On the Efficiency of Selective Search
Hafizoglu, Fatih; Kucukoglu, Emre Can; Altıngövde, İsmail Sengör (2017-04-13)
Our work shows that the query latency for selective search over a topically partitioned collection can be reduced by up to 55%. We achieve this by physically storing the documents in each topical cluster across all shards and building a cluster-skipping index at each shard. Our approach also achieves uniform load balance among the shards.
Improving forecasting accuracy of time series data using a new ARIMA-ANN hybrid method and empirical mode decomposition
Buyuksahin, Umit Cavus; Ertekin Bolelli, Şeyda (Elsevier BV, 2019-10-07)
Many applications in different domains produce large amount of time series data. Making accurate forecasting is critical for many decision makers. Various time series forecasting methods exist that use linear and nonlinear models separately or combination of both. Studies show that combining of linear and nonlinear models can be effective to improve forecasting performance. However, some assumptions that those existing methods make, might restrict their performance in certain situations. We provide a new Au...
Using object-oriented materialized views to answer selection-based complex queries
Alhajj, R; Polat, Faruk (1999-09-01)
Presented in this paper is a model that utilizes existing materialized views to handle a wide range of complex selection-based queries, including linear recursive queries. Such queries are complex because it is almost impossible for naive users to predict the formulation of their predicate expressions. Object variables bound to objects in the result of a query are allowed to appear in the predicate of that query. Also, the predicate definition is extended to make it possible to have in the output only a sub...
Citation Formats
F. Hafızoğlu, “Improving the efficiency of distributed information retrieval using hybrid index partitioning,” M.S. - Master of Science, Middle East Technical University, 2018.