Cost-aware result caching strategies for meta-search engines

Download
2015
Bakkal, Emre
Meta-search engines are tools that generate top-k search results of a query by combining local top-k search results retrieved from various data sources in parallel. A result cache that stores the results of the previously seen queries is a crucial component in a meta-search engine to improve the efficiency, scalability and availability of the system. Our goal in this thesis is to design and analyze different cost-aware and dynamic result caching strategies to be used in meta-search engines. To this end, as our first contribution, we utilize the well-known cost-aware eviction policies in the literature in three different caching approaches, namely, query-level, resource-level and entry-level caching; that arise naturally in the meta-search setup. Next, we propose a novel entry-level caching approach that is again cost-aware and fits well to the special embarrassingly-parallel nature of the meta-search scenario. The proposed approaches are evaluated using the cache miss-cost metric in a large-scale simulation setup where the impact of various parameters (such as the number of resources, cache size and query cost distribution) is also investigated. Our simulation results show that the highest performance is obtained by using the entry-level caching approaches; and furthermore, our newly proposed approach outperforms both the traditional baselines and other cost-aware competitors.

Suggestions

Efficient processing of category-restricted queries for Web directories
Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2008-01-01)
We show that a cluster-skipping inverted index (CS-IIS) is a practical and efficient file structure to support category-restricted queries for searching Web directories. The query processing strategy with CS-IIS improves CPU time efficiency without imposing any limitations on the directory size.
A five-level static cache architecture for web search engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; Junqueira, Flavio P.; Ulusoy, Ozgur (2012-09-01)
Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. Moreover, we propose a greedy heuristic to prioritize items for caching, based on gains computed by us...
Advanced methods for result and score caching in web search engines
Yafay, Erman.; Altıngövde, İsmail Sengör; Department of Computer Engineering (2019)
Search engines employ caching techniques in main memory to improve system efficiency and scalability. In this thesis, we focus on improving the cache performance for web search engines where our contributions can be separated into two main parts. Firstly, we investigate the impact of the sample size for frequency statistics for most popular cache eviction strategies in the literature, and show that cache performance improves with larger samples, i.e., by storing the frequencies of all (or, most of) the quer...
Timestamp-based result cache invalidation for web search engines
Alici, Sadiye; Altıngövde, İsmail Sengör; Ozcan, Rifat; Cambazoglu, B. Barla; Ulusoy, Özgür (2011-01-01)
The result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify queries whose cached results are stale. The basic idea behind our mechanism is to maintain and compare generation time of query results with update times of posting lists and documents to decide on staleness of query results. The proposed technique is evaluate...
Cost-Aware Strategies for Query Result Caching in Web Search Engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Ulusoy, Ozgor (Association for Computing Machinery (ACM), 2011-05-01)
Search engines and large-scale IR systems need to cache query results for efficiency and scalability purposes. Static and dynamic caching techniques (as well as their combinations) are employed to effectively cache query results. In this study, we propose cost-aware strategies for static and dynamic caching setups. Our research is motivated by two key observations: (i) query processing costs may significantly vary among different queries, and (ii) the processing cost of a query is not proportional to its po...
Citation Formats
E. Bakkal, “Cost-aware result caching strategies for meta-search engines,” M.S. - Master of Science, Middle East Technical University, 2015.