Advanced methods for result and score caching in web search engines

Download
2019
Yafay, Erman.
Search engines employ caching techniques in main memory to improve system efficiency and scalability. In this thesis, we focus on improving the cache performance for web search engines where our contributions can be separated into two main parts. Firstly, we investigate the impact of the sample size for frequency statistics for most popular cache eviction strategies in the literature, and show that cache performance improves with larger samples, i.e., by storing the frequencies of all (or, most of) the queries seen by the search engine. We mitigate the cost of storing a large history of frequencies by using a Counting Bloom Filter based data structure that is able to store frequency statistics in a compact manner, while still providing comparable cache performance to keeping all frequencies in a raw manner. Secondly, we propose a new cache type for systems that employ dynamic pruning strategies (e.g. WAND, BMW) for query processing. We store the k-th highest result score for a query alongside with its result cache entry and whenever a result cache miss occurs, we use k-th score of the subsets of the original query as an initial threshold value for dynamic pruning. Our method reduces the query processing times by increasing the number of documents skipped and, to our knowledge, it is unique in the sense that it can improve processing times for compulsory result cache misses and singleton queries.

Suggestions

Efficient processing of category-restricted queries for Web directories
Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2008-01-01)
We show that a cluster-skipping inverted index (CS-IIS) is a practical and efficient file structure to support category-restricted queries for searching Web directories. The query processing strategy with CS-IIS improves CPU time efficiency without imposing any limitations on the directory size.
Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines
Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; Ulusoy, Ozgur (2011-01-01)
Result caches are vital for efficiency of search engines. In this work, we propose a novel caching strategy in which a dynamic result cache is split into two layers: an HTML cache and a docID cache. The HTML cache in the first layer stores the result pages computed for queries. The docID cache in the second layer stores ids of documents in search results. Experiments under various scenarios show that, in terms of average query processing time, this hybrid caching approach outperforms the traditional approac...
Timestamp-based result cache invalidation for web search engines
Alici, Sadiye; Altıngövde, İsmail Sengör; Ozcan, Rifat; Cambazoglu, B. Barla; Ulusoy, Özgür (2011-01-01)
The result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify queries whose cached results are stale. The basic idea behind our mechanism is to maintain and compare generation time of query results with update times of posting lists and documents to decide on staleness of query results. The proposed technique is evaluate...
A five-level static cache architecture for web search engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; Junqueira, Flavio P.; Ulusoy, Ozgur (2012-09-01)
Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. Moreover, we propose a greedy heuristic to prioritize items for caching, based on gains computed by us...
Utilization of navigational queries for result presentation and caching in search engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Ulusoy, Özgür (2008-12-01)
We propose result page models with varying granularities for navigational queries and show that this approach provides a better utilization of cache space and reduces bandwidth requirements.
Citation Formats
E. Yafay, “Advanced methods for result and score caching in web search engines,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Computer Engineering., Middle East Technical University, 2019.