A five-level static cache architecture for web search engines

Download

index.pdf

Date

2012-09-01

Author

Ozcan, Rifat
Altıngövde, İsmail Sengör
Barla Cambazoglu, B.
Junqueira, Flavio P.
Ulusoy, Ozgur

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

217
views

0
downloads

Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. Moreover, we propose a greedy heuristic to prioritize items for caching, based on gains computed by using items' past access frequencies, estimated computational costs, and storage overheads. This heuristic takes into account the inter-dependency between individual items when making its caching decisions, i.e., after a particular item is cached, gains of all items that are affected by this decision are updated. Our simulations under realistic assumptions reveal that the proposed heuristic performs better than dividing the entire cache space among particular item types at fixed proportions.

Subject Keywords

Query processing, Static caching, Web search engines

URI

https://hdl.handle.net/11511/42847

Journal

INFORMATION PROCESSING & MANAGEMENT

DOI

https://doi.org/10.1016/j.ipm.2010.12.007

Collections

Department of Computer Engineering, Article

Suggestions

OpenMETU
Core

Utilization of navigational queries for result presentation and caching in search engines Ozcan, Rifat; Altıngövde, İsmail Sengör; Ulusoy, Özgür (2008-12-01) We propose result page models with varying granularities for navigational queries and show that this approach provides a better utilization of cache space and reduces bandwidth requirements.
A Cost-Aware Strategy for Query Result Caching in Web Search Engines Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2009-01-01) Search engines and large scale IR systems need to cache query results for efficiency and scalability purposes. In this study, we propose to explicitly incorporate the query costs in the static caching policy. To this end, a query’s cost is represented by its execution time, which involves CPU time to decompress the postings and compute the query-document similarities to obtain the final top-N answers. Simulation results using a large Web crawl data and a real query log reveal that the proposed strategy impr...
Advanced methods for result and score caching in web search engines Yafay, Erman.; Altıngövde, İsmail Sengör; Department of Computer Engineering (2019) Search engines employ caching techniques in main memory to improve system efficiency and scalability. In this thesis, we focus on improving the cache performance for web search engines where our contributions can be separated into two main parts. Firstly, we investigate the impact of the sample size for frequency statistics for most popular cache eviction strategies in the literature, and show that cache performance improves with larger samples, i.e., by storing the frequencies of all (or, most of) the quer...
Efficient processing of category-restricted queries for Web directories Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2008-01-01) We show that a cluster-skipping inverted index (CS-IIS) is a practical and efficient file structure to support category-restricted queries for searching Web directories. The query processing strategy with CS-IIS improves CPU time efficiency without imposing any limitations on the directory size.
Strategies for setting time-to-live values in result caches Sazoglu, Fethi Burak; Cambazoglu, B. Barla; Ozcan, Rifat; Altıngövde, İsmail Sengör; Ulusoy, Özgür (2013-12-11) In web query result caching, staleness of queries are often bounded via a time-to-live (TTL) mechanism, which expires the validity of cached query results at some point in time. In this work, we evaluate the performance of three alternative TTL mechanisms: time-based TTL, frequency-based TTL, and click-based TTL. Moreover, we propose hybrid approaches obtained by pair-wise combination of these mechanisms. Our results indicate that combining time-based TTL with frequency-based TTL yields superior performance...

Citation Formats

R. Ozcan, İ. S. Altıngövde, B. Barla Cambazoglu, F. P. Junqueira, and O. Ulusoy, “A five-level static cache architecture for web search engines,” INFORMATION PROCESSING & MANAGEMENT, pp. 828–840, 2012, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/42847.