Efficient discovery of join plans in schemaless data

Download

index.pdf

Date

2009-09-01

Author

Acar, Aybar Can

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

166
views

0
downloads

We describe a method of inferring join plans for a set of relation instances, in the absence of any metadata, such as attribute domains, attribute names, or constraints (e.g., keys or foreign keys). Our method enumerates the possible join plans in order of likelihood, based on the compatibility of a pair of columns and their suitability as join attributes (i.e. their appropriateness as keys). We outline two variants of the approach. The first variant is accurate but potentially time-consuming, especially for large relations that do not fit in memory. The second variant is an approximation of the former and hence less accurate, but is considerably more efficient, allowing the method to be used online, even for large relations. We provide experimental results showing how both forms scale in terms of performance as the number of candidate join attributes and the size of the relations increase. We also characterize the accuracy of the approximate variant with respect to the exact variant.

Subject Keywords

Dependency Inference, Join inference, Schema Matching

URI

https://hdl.handle.net/11511/30683

DOI

https://doi.org/10.1145/1620432.1620434

Collections

Graduate School of Informatics, Conference / Seminar

Suggestions

OpenMETU
Core

Improvement of corpus-based semantic word similarity using vector space model Esin, Yunus Emre; Alpaslan, Ferda Nur; Department of Computer Engineering (2009) This study presents a new approach for finding semantically similar words from corpora using window based context methods. Previous studies mainly concentrate on either finding new combination of distance-weight measurement methods or proposing new context methods. The main di fference of this new approach is that this study reprocesses the outputs of the existing methods to update the representation of related word vectors used for measuring semantic distance between words, to improve the results further. ...
Fast, efficient and dynamically optimized data and hardware architectures for string matching Zengin, Salih; Güran, Hasan Cengiz; Schmidt, Şenan Ece; Department of Electrical and Electronics Engineering (2014) Many fields of computing such as network intrusion detection employ string matching modules (SMM) that search for a given set of strings in their input. An SMM is expected to produce correct outcomes while scanning the input data at high rates. Furthermore, the string sets that are searched for are usually large and their sizes increase steadily. In this thesis, motivated by the requirement of designing fast, accurate and efficient SMMs; we propose a number of SMM architectures that employ Bloom Filters to ...
Using semantic web services for data integration in banking domain Okat, Çağlar; Doğru, Ali Hikmet; Department of Computer Engineering (2010) A semantic model oriented transformation mechanism is developed for the centralization of intra-enterprise data integration. Such a mechanism is especially crucial in the banking domain which is selected in this study. A new domain ontology is constructed to provide basis for annotations. A bottom-up approach is preferred for semantic annotations to utilize existing web service definitions. Transformations between syntactic web service XML responses and semantic model concepts are defined in transformation ...
An index structure for fuzzy databases Yazıcı, Adnan (1996-09-11) Fuzzy querying involves more complex processing than ordinary querying does. In addition, a larger number of tuples will possibly be selected by fuzzy conditions compared to the crisp ones. The current index structures are inefficient in representing and dealing with uncertain and fuzzy data. In this paper we extend one of the multi-dimensional data structures, namely Multi Lever Grid File (Whang and Krishnamurty, 1991) for an efficient access to both crisp and fuzzy data. In order to take advantage of the ...
Efficient processing of category-restricted queries for Web directories Altıngövde, İsmail Sengör; Ulusoy, Oezguer (2008-01-01) We show that a cluster-skipping inverted index (CS-IIS) is a practical and efficient file structure to support category-restricted queries for searching Web directories. The query processing strategy with CS-IIS improves CPU time efficiency without imposing any limitations on the directory size.

Citation Formats

A. C. Acar, “Efficient discovery of join plans in schemaless data,” 2009, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/30683.