A novel and precise false positive probability computation for Bloom Filters implemented with universal hash functions

Download
2022-8-22
Koltuk, Furkan
Bloom Filters (BF) are multiple-hashing data structures that are widely used in membership testing applications. The many-to-one nature of the BF hashing results in false positive outcomes which have to be further processed at a performance cost. The computation of the false positive probability of BFs is carried out under the assumption of uniform and independent hash functions. To the best of our knowledge, all previous work in the literature assume that the hash functions are uniform and independent without verifying if that is the case in reality. This thesis focuses on the hash function uniformity and independence for BFs with universal H3 functions. To this end, we propose a formal framework for defining and quantifying the uniformity and independence for H3 hash functions in BFs. Furthermore, we define a formal description of the many-to-one outcomes of H3 hash functions. We then use this framework to precisely compute the false positive probability for BFs with H3 hash functions which might not be necessarily uniform or independent. We verify our precise false positive expression with a hardware test bed that can execute 2 · 10^8 membership queries per second. We perform structured tests to evaluate the effect of losing uniformity and independence at different levels among the H3 hash functions. Furthermore, we evaluate the results of our expression for a range of parameters.

Suggestions

A NOVEL METHOD FOR DISCRETE COEFFICIENT FIR DIGITAL FILTER DESIGN
Çiloğlu, Tolga (1994-06-02)
A local search algorithm for discrete coefficient FIR filter design is presented. The minmax objective function is minimized by moving along the low gradient directions. A new method to forecast these directions is proposed. The algorithm is suitable to design high order filters in a short time. The results are compared to other methods both in quality and computational load
A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning
Vural, Elif (2018-01-01)
Supervised manifold learning methods learn data representations by preserving the geometric structure of data while enhancing the separation between data samples from different classes. In this work, we propose a theoretical study of supervised manifold learning for classification. We consider nonlinear dimensionality reduction algorithms that yield linearly separable embeddings of training data and present generalization bounds for this type of algorithms. A necessary condition for satisfactory generalizat...
A memetic algorithm for clustering with cluster based feature selection
Şener, İlyas Alper; İyigün, Cem; Department of Operational Research (2022-8)
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of fe...
A Theoretical Analysis of Multi-Modal Representation Learning with Regular Functions
Vural, Elif (2021-01-07)
Multi-modal data analysis methods often learn representations that align different modalities in a new common domain, while preserving the within-class compactness and within-modality geometry and enhancing the between-class separation. In this study, we present a theoretical performance analysis for multi-modal representation learning methods. We consider a quite general family of algorithms learning a nonlinear embedding of the data space into a new space via regular functions. We derive sufficient condit...
A Comparative Evaluation of Two Computer SupportedCollaborative Work Systems for Supporting Collaborative BusinessProcess Modeling Activities
Fındık Coşkunçay, Duygu; Çakır, Murat Perit (2018-10-01)
In this study, a comparative evaluation of different Computer-Supported Collaborative Work CSCW environments was conducted to reveal their constraints and affordances for supporting synchronous collaborative business process modeling cBPM activities online. For this purpose, two case studies were carried out with two CSCW systems that differ in terms of their interaction design features for supporting joint work. The dual-eye tracking method was employed to monitor how the participants focused their attenti...
Citation Formats
F. Koltuk, “A novel and precise false positive probability computation for Bloom Filters implemented with universal hash functions,” M.S. - Master of Science, Middle East Technical University, 2022.