Advanced methods for result and score caching in web search engines

Download
2019
Yafay, Erman.
Search engines employ caching techniques in main memory to improve system efficiency and scalability. In this thesis, we focus on improving the cache performance for web search engines where our contributions can be separated into two main parts. Firstly, we investigate the impact of the sample size for frequency statistics for most popular cache eviction strategies in the literature, and show that cache performance improves with larger samples, i.e., by storing the frequencies of all (or, most of) the queries seen by the search engine. We mitigate the cost of storing a large history of frequencies by using a Counting Bloom Filter based data structure that is able to store frequency statistics in a compact manner, while still providing comparable cache performance to keeping all frequencies in a raw manner. Secondly, we propose a new cache type for systems that employ dynamic pruning strategies (e.g. WAND, BMW) for query processing. We store the k-th highest result score for a query alongside with its result cache entry and whenever a result cache miss occurs, we use k-th score of the subsets of the original query as an initial threshold value for dynamic pruning. Our method reduces the query processing times by increasing the number of documents skipped and, to our knowledge, it is unique in the sense that it can improve processing times for compulsory result cache misses and singleton queries.

Suggestions

Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines
Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; Ulusoy, Ozgur (2011-01-01)
Result caches are vital for efficiency of search engines. In this work, we propose a novel caching strategy in which a dynamic result cache is split into two layers: an HTML cache and a docID cache. The HTML cache in the first layer stores the result pages computed for queries. The docID cache in the second layer stores ids of documents in search results. Experiments under various scenarios show that, in terms of average query processing time, this hybrid caching approach outperforms the traditional approac...
Efficient processing of category-restricted queries for Web directories
Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2008-01-01)
We show that a cluster-skipping inverted index (CS-IIS) is a practical and efficient file structure to support category-restricted queries for searching Web directories. The query processing strategy with CS-IIS improves CPU time efficiency without imposing any limitations on the directory size.
A five-level static cache architecture for web search engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; Junqueira, Flavio P.; Ulusoy, Ozgur (2012-09-01)
Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. Moreover, we propose a greedy heuristic to prioritize items for caching, based on gains computed by us...
Utilization of navigational queries for result presentation and caching in search engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Ulusoy, Özgür (2008-12-01)
We propose result page models with varying granularities for navigational queries and show that this approach provides a better utilization of cache space and reduces bandwidth requirements.
Explicit Search Result Diversification Using Score and Rank Aggregation Methods
Ozdemiray, Ahmet Murat; Altıngövde, İsmail Sengör (2015-06-01)
Search result diversification is one of the key techniques to cope with the ambiguous and underspecified information needs of web users. In the last few years, strategies that are based on the explicit knowledge of query aspects emerged as highly effective ways of diversifying search results. Our contributions in this article are two-fold. First, we extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its potential weaknesses. We propose basic yet novel ...
Citation Formats
E. Yafay, “Advanced methods for result and score caching in web search engines,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Computer Engineering., Middle East Technical University, 2019.