Static index pruning in web search engines

Download
2012-2-1
Static index pruning techniques permanently remove a presumably redundant part of an inverted file, to reduce the file size and query processing time. These techniques differ in deciding which parts of an index can be removed safely; that is, without changing the top-ranked query results. As defined in the literature, the query view of a document is the set of query terms that access to this particular document, that is, retrieves this document among its top results. In this paper, we first propose using query views to improve the quality of the top results compared against the original results. We incorporate query views in a number of static pruning strategies, namely term-centric, document-centric, term popularity based and document access popularity based approaches, and show that the new strategies considerably outperform their counterparts especially for the higher levels of pruning and for both disjunctive and conjunctive query processing. Additionally, we combine the notions of term and document access popularity to form new pruning strategies, and further extend these strategies with the query views. The new strategies improve the result quality especially for the conjunctive query processing, which is the default and most common search mode of a search engine.
ACM Transactions on Information Systems

Suggestions

Incremental cluster-based retrieval using compressed cluster-skipping inverted files
Altıngövde, İsmail Sengör; Can, Fazli; Ulusoy, Oezguer (2008-01-01)
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we swit...
Site-based dynamic pruning for query processing in search engines
Altıngövde, İsmail Sengör; Can, Fazli; Ulusoy, Özgür (2008-12-15)
Web search engines typically index and retrieve at the page level. In this study, we investigate a dynamic pruning strategy that allows the query processor to first determine the most promising websites and then proceed with the similarity computations for those pages only within these sites.
Using object-oriented materialized views to answer selection-based complex queries
Alhajj, R; Polat, Faruk (1999-09-01)
Presented in this paper is a model that utilizes existing materialized views to handle a wide range of complex selection-based queries, including linear recursive queries. Such queries are complex because it is almost impossible for naive users to predict the formulation of their predicate expressions. Object variables bound to objects in the result of a query are allowed to appear in the predicate of that query. Also, the predicate definition is extended to make it possible to have in the output only a sub...
On the size of full element-indexes for XML keyword search
Atilgan, Duygu; Altıngövde, İsmail Sengör; Ulusoy, Özgür (2012-04-27)
We show that a full element-index can be as space-efficient as a direct index with Dewey ids, after compression using typical techniques
Multimodal query-level fusion for efficient multimedia information retrieval
Sattari, Saeid; Yazıcı, Adnan (2018-10-01)
Managing a large volume of multimedia data containing various modalities such as visual, audio, and text reveals the necessity for efficient methods for modeling, processing, storing, and retrieving complex data. In this paper, we propose a fusion-based approach at the query level to improve query retrieval performance of multimedia data. We discuss various flexible query types including the combination of content as well as concept-based queries that provide users with the ability to efficiently perform mu...
Citation Formats
İ. S. Altıngövde and Ö. Ulusoy, “Static index pruning in web search engines,” ACM Transactions on Information Systems, pp. 1–28, 2012, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/28301.