Efficiency and effectiveness of query processing in cluster-based retrieval

Download
2004-12-01
Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the cover coefficient-based clustering methodology ((CM)-M-3), and the Financial Times database of TREC containing 210 158 documents of size 564 MB defined by 229 748 terms with total of 29 545 234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering.
INFORMATION SYSTEMS

Suggestions

Improving pattern quality in web usage mining by using semantic information
Karagöz, Pınar (Springer Science and Business Media LLC, 2012-03-01)
Frequent Web navigation patterns generated by using Web usage mining techniques provide valuable information for several applications such as Web site restructuring and recommendation. In conventional Web usage mining, semantic information of the Web page content does not take part in the pattern generation process. In this work, we investigate the effect of semantic information on the patterns generated for Web usage mining in the form of frequent sequences. To this aim, we developed a technique and a fram...
Performing and analyzing non-formal inspections of entity relationship diagram (ERD)
ÇAĞILTAY, NERGİZ; TOKDEMİR, GÜL; Kilic, Ozkan; Topalli, Damla (Elsevier BV, 2013-08-01)
Designing and understanding of diagrammatic representations is a critical issue for the success of software projects because diagrams in this field provide a collection of related information with various perceptual signs and they help software engineers to understand operational systems at different levels of information system development process. Entity relationship diagram (ERD) is one of the main diagrammatic representations of a conceptual data model that reflects users' data requirements in a databas...
An architecture for workflow scheduling under resource allocation constraints
Karagöz, Pınar; Toroslu, İsmail Hakkı (Elsevier BV, 2005-07-01)
Research on specification and scheduling of workflows has concentrated on temporal and causality constraints, which specify existence and order dependencies among tasks. However, another set of constraints that specify resource allocation is also equally important. The resources in a workflow environment are agents such as person, machine, software, etc. that execute the task. Execution of a task has a cost and this may vary depending on the resources allocated in order to execute that task. Resource alloca...
RELIEF-MM: effective modality weighting for multimedia information retrieval
Yilmaz, Turgay; Yazıcı, Adnan; Kitsuregawa, Masaru (Springer Science and Business Media LLC, 2014-07-01)
Fusing multimodal information in multimedia data usually improves the retrieval performance. One of the major issues in multimodal fusion is how to determine the best modalities. To combine the modalities more effectively, we propose a RELIEF-based modality weighting approach, named as RELIEF-MM. The original RELIEF algorithm is extended for weaknesses in several major issues: class-specific feature selection, complexities with multi-labeled data and noise, handling unbalanced datasets, and using the algori...
Multimodal concept detection in broadcast media: KavTan
SOYSAL, Medeni; Alatan, Abdullah Aydın; TEKİN, Mashar; ESEN, Ersin; SARACOĞLU, Ahmet; Acar, Banu Oskay; Ozan, Ezgi Can; Ates, Tugrul K.; SEVİMLİ, Hakan; SEVİNÇ, Muge; ATIL, Ilkay; Ozkan, Savas; Arabaci, Mehmet Ali; TANKIZ, Seda; KARADENİZ, Talha; ÖNÜR, Duygu; SELÇUK, Sezin; Alatan, A. Aydin; Çiloğlu, Tolga (Springer Science and Business Media LLC, 2014-10-01)
Concept detection stands as an important problem for efficient indexing and retrieval in large video archives. In this work, the KavTan System, which performs high-level semantic classification in one of the largest TV archives of Turkey, is presented. In this system, concept detection is performed using generalized visual and audio concept detection modules that are supported by video text detection, audio keyword spotting and specialized audio-visual semantic detection components. The performance of the p...
Citation Formats
F. Can, İ. S. Altıngövde, and E. Demir, “Efficiency and effectiveness of query processing in cluster-based retrieval,” INFORMATION SYSTEMS, pp. 697–717, 2004, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/40887.