Incremental cluster-based retrieval using compressed cluster-skipping inverted files

Download
2008-01-01
Altıngövde, İsmail Sengör
Can, Fazli
Ulusoy, Oezguer
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental- CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size.
ACM TRANSACTIONS ON INFORMATION SYSTEMS

Suggestions

Static index pruning in web search engines
Altıngövde, İsmail Sengör; Ulusoy, Özgür (Association for Computing Machinery (ACM), 2012-2-1)
Static index pruning techniques permanently remove a presumably redundant part of an inverted file, to reduce the file size and query processing time. These techniques differ in deciding which parts of an index can be removed safely; that is, without changing the top-ranked query results. As defined in the literature, the query view of a document is the set of query terms that access to this particular document, that is, retrieves this document among its top results. In this paper, we first propose using qu...
Remedial actions for disassembly lines with stochastic task times
ALTEKİN, FATMA TEVHİDE; Bayındır, Zeynep Pelin; Gumuskaya, Volkan (2016-09-01)
We suggest the incorporation of remedial actions for profit-oriented disassembly lines with stochastic task times. When task times are stochastic, there is a probability that some of the tasks are not completed within the predefined cycle time. For-task incompletions in disassembly lines, pure remedial actions of stopping the line and offline disassembly are proposed along with the hybrid line which is a combination of the two pure remedial actions. The remedial actions have a significant effect on the expe...
Low-frequency multilevel fast multipole algorithm using an approximate diagonalization of the Green's function
Ergül, Özgür Salih (2014-08-23)
We present an approximate diagonalization of the Green's function to implement a stable multilevel fast multipole algorithm (MLFMA) for low-frequency problems. The diagonalization is based on scaled spherical functions, leading to stable computations of translation operators at all distances and for all frequencies. Similar to the conventional diagonalization, shift operators are expressed in terms of complex exponentials, while radiated and incoming fields are expanded in terms of scaled plane waves. Even ...
Stabilization of the Fast Multipole Method for Low Frequencies Using Multiple-Precision Arithmetic
Karaosmanoglu, Bariscan; Ergül, Özgür Salih (2014-08-23)
We stabilize a conventional implementation of the fast multipole method (FMM) for low frequencies using multiple-precision arithmetic (MPA). We show that using MPA is a direct remedy for low-frequency breakdowns of the standard diagonalization, which is prone to numerical errors at short distances with respect to wavelength. By increasing the precision, rounding errors are suppressed until a desired level of accuracy is obtained with plane-wave expansions. As opposed to other approaches in the literature, u...
IMOTION — A Content-based video retrieval engine
Rossetto, Luca; Giangreco, Ivan; Schuldt, Heiko; Dupont, Stephane; Seddati, Omar; Sezgin, Metin; Sahillioğlu, Yusuf (2015-01-05)
This paper introduces the IMOTION system, a sketch-based video retrieval engine supporting multiple query paradigms. For vector space retrieval, the IMOTION system exploits a large variety of low-level image and video features, as well as high-level spatial and temporal features that can all be jointly used in any combination. In addition, it supports dedicated motion features to allow for the specification of motion within a video sequence. For query specification, the IMOTION system supports query-by-sket...
Citation Formats
İ. S. Altıngövde, F. Can, and O. Ulusoy, “Incremental cluster-based retrieval using compressed cluster-skipping inverted files,” ACM TRANSACTIONS ON INFORMATION SYSTEMS, pp. 0–0, 2008, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/41690.