Segmenting and Labeling Query Sequences in a Multidatabase Environment

Download
2011-01-01
When gathering information from multiple independent data sources, users will generally pose a sequence of queries to each source, combine (union) or cross-reference (join) the results in order to obtain the information they need. Furthermore, when gathering information, there is a fair bit of trial and error involved, where queries are recursively refined according to the results of a previous query in the sequence. From the point of view of an outside observer, the aim of such a sequence of queries may not be immediately obvious. We investigate the problem of isolating and characterizing subsequences representing coherent information retrieval goals out of a sequence of queries sent by a user to different data sources over a period of time. The problem has two sub-problems: segmenting the sequence into subsequences, each representing a discrete goal; and labeling each query in these subsequences according to how they contribute to the goal. We propose a method in which a discriminative probabilistic model (a Conditional Random Field) is trained with pre-labeled sequences. We have tested the accuracy with which such a model can infer labels and segmentation on novel sequences. Results show that the approach is very accurate (> 95% accuracy) when there are no spurious queries in the sequence and moderately accurate even in the presence of substantial noise (∼70% accuracy when 15% of queries in the sequence are spurious).

Suggestions

Using object-oriented materialized views to answer selection-based complex queries
Alhajj, R; Polat, Faruk (1999-09-01)
Presented in this paper is a model that utilizes existing materialized views to handle a wide range of complex selection-based queries, including linear recursive queries. Such queries are complex because it is almost impossible for naive users to predict the formulation of their predicate expressions. Object variables bound to objects in the result of a query are allowed to appear in the predicate of that query. Also, the predicate definition is extended to make it possible to have in the output only a sub...
A Cost-Aware Strategy for Query Result Caching in Web Search Engines
Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2009-01-01)
Search engines and large scale IR systems need to cache query results for efficiency and scalability purposes. In this study, we propose to explicitly incorporate the query costs in the static caching policy. To this end, a query’s cost is represented by its execution time, which involves CPU time to decompress the postings and compute the query-document similarities to obtain the final top-N answers. Simulation results using a large Web crawl data and a real query log reveal that the proposed strategy impr...
Improving the performance of Hadoop/Hive by sharing scan and computation tasks
Özal, Serkan; Toroslu, İsmail Hakkı; Doğaç, Asuman; Department of Computer Engineering (2013)
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse sy...
Comparison of feature-based and image registration-based retrieval of image data using multidimensional data access methods
Arslan, Serdar; Yazıcı, Adnan; Sacan, Ahmet; Toroslu, İsmail Hakkı; Acar, Esra (Elsevier BV, 2013-07-01)
In information retrieval, efficient similarity search in multimedia collections is a critical task In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem, including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy trade-offs for each of these methods are demonstrated on three different image data sets. Similarity of images is obtained either by a feature-based similarity measure using four MPEG-...
Static query result caching revisited
Ozcan, Rifat; Altıngövde, İsmail Sengör; Ulusoy, Özgür (2008-12-15)
Query result caching is an important mechanism for search engine efficiency. In this study, we first review several query features that are used to determine the contents of a static result cache. Next, we introduce a new feature that more accurately represents the popularity of a query by measuring the stability of query frequency over a set of time intervals. Experimental results show that this new feature achieves hit ratios better than those of the previously proposed features.
Citation Formats
A. C. Acar, “Segmenting and Labeling Query Sequences in a Multidatabase Environment,” 2011, vol. 7044, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/32501.