Advanced methods for result and score caching in web search engines

Yafay, Erman.
Search engines employ caching techniques in main memory to improve system efficiency and scalability. In this thesis, we focus on improving the cache performance for web search engines where our contributions can be separated into two main parts. Firstly, we investigate the impact of the sample size for frequency statistics for most popular cache eviction strategies in the literature, and show that cache performance improves with larger samples, i.e., by storing the frequencies of all (or, most of) the queries seen by the search engine. We mitigate the cost of storing a large history of frequencies by using a Counting Bloom Filter based data structure that is able to store frequency statistics in a compact manner, while still providing comparable cache performance to keeping all frequencies in a raw manner. Secondly, we propose a new cache type for systems that employ dynamic pruning strategies (e.g. WAND, BMW) for query processing. We store the k-th highest result score for a query alongside with its result cache entry and whenever a result cache miss occurs, we use k-th score of the subsets of the original query as an initial threshold value for dynamic pruning. Our method reduces the query processing times by increasing the number of documents skipped and, to our knowledge, it is unique in the sense that it can improve processing times for compulsory result cache misses and singleton queries.