Cost-aware result caching strategies for meta-search engines

Bakkal, Emre
Meta-search engines are tools that generate top-k search results of a query by combining local top-k search results retrieved from various data sources in parallel. A result cache that stores the results of the previously seen queries is a crucial component in a meta-search engine to improve the efficiency, scalability and availability of the system. Our goal in this thesis is to design and analyze different cost-aware and dynamic result caching strategies to be used in meta-search engines. To this end, as our first contribution, we utilize the well-known cost-aware eviction policies in the literature in three different caching approaches, namely, query-level, resource-level and entry-level caching; that arise naturally in the meta-search setup. Next, we propose a novel entry-level caching approach that is again cost-aware and fits well to the special embarrassingly-parallel nature of the meta-search scenario. The proposed approaches are evaluated using the cache miss-cost metric in a large-scale simulation setup where the impact of various parameters (such as the number of resources, cache size and query cost distribution) is also investigated. Our simulation results show that the highest performance is obtained by using the entry-level caching approaches; and furthermore, our newly proposed approach outperforms both the traditional baselines and other cost-aware competitors.