Utilizing query performance predictors for early termination in meta-search

2016
Şener, Emre
In the context of web, a meta-search engine is a system that forwards an incoming user query to all the component search engines (aka, resources); and then merges the retrieved results. Given that hundreds of such resources may exist, it is mandatory for a meta-search engine to avoid forwarding a query to all available resources, but rather focus on a subset of them. In this thesis, we first introduce a novel incremental query forwarding strategy for meta-search. More specifically, given a ranked list of N search engines, our strategy operates in rounds, such that in each round, we retrieve the results of the next k “unvisited” resources in the list (where k<N), asses the quality of the intermediate merged list, and stop if any further quality improvement seems unlikely. As our second contribution, we introduce a novel incremental query result merging strategy. In this strategy, we forward query to all search engines but we assess the quality of intermediate merged lists as early as we retrieve the results from an engine and stop if any further quality improvements are not likely. In order to assess the result quality, we utilize post-retrieval query performance prediction (QPP) techniques. Our experiments using the standard FedWeb 2013 dataset reveal that the proposed strategies can reduce the response time and/or network bandwidth usage, while the quality of the result is comparable to, or sometimes, even better than the baseline strategy.

Suggestions

A five-level static cache architecture for web search engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; Junqueira, Flavio P.; Ulusoy, Ozgur (2012-09-01)
Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. Moreover, we propose a greedy heuristic to prioritize items for caching, based on gains computed by us...
Effective & efficient methods for web search result diversification
Özdemiray, Ahmet Murat; Altıngövde, İsmail Sengör; Department of Computer Engineering (2015)
Search result diversification is one of the key techniques to cope with the ambiguous and/or underspecified information needs of the web users. In this study we first extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its weaknesses. We propose basic yet novel optimizations to remedy these weaknesses and boost the performance of this algorithm. Secondly, we cast the diversification problem to the problem of ranking aggregation and propose to materializ...
A Cost-Aware Strategy for Query Result Caching in Web Search Engines
Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2009-01-01)
Search engines and large scale IR systems need to cache query results for efficiency and scalability purposes. In this study, we propose to explicitly incorporate the query costs in the static caching policy. To this end, a query’s cost is represented by its execution time, which involves CPU time to decompress the postings and compute the query-document similarities to obtain the final top-N answers. Simulation results using a large Web crawl data and a real query log reveal that the proposed strategy impr...
Improving the performance of Hadoop/Hive by sharing scan and computation tasks
Özal, Serkan; Toroslu, İsmail Hakkı; Doğaç, Asuman; Department of Computer Engineering (2013)
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse sy...
Explicit Search Result Diversification Using Score and Rank Aggregation Methods
Ozdemiray, Ahmet Murat; Altıngövde, İsmail Sengör (2015-06-01)
Search result diversification is one of the key techniques to cope with the ambiguous and underspecified information needs of web users. In the last few years, strategies that are based on the explicit knowledge of query aspects emerged as highly effective ways of diversifying search results. Our contributions in this article are two-fold. First, we extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its potential weaknesses. We propose basic yet novel ...
Citation Formats
E. Şener, “Utilizing query performance predictors for early termination in meta-search,” M.S. - Master of Science, Middle East Technical University, 2016.