A novel and precise false positive probability computation for Bloom Filters implemented with universal hash functions

Download
2022-8-22
Koltuk, Furkan
Bloom Filters (BF) are multiple-hashing data structures that are widely used in membership testing applications. The many-to-one nature of the BF hashing results in false positive outcomes which have to be further processed at a performance cost. The computation of the false positive probability of BFs is carried out under the assumption of uniform and independent hash functions. To the best of our knowledge, all previous work in the literature assume that the hash functions are uniform and independent without verifying if that is the case in reality. This thesis focuses on the hash function uniformity and independence for BFs with universal H3 functions. To this end, we propose a formal framework for defining and quantifying the uniformity and independence for H3 hash functions in BFs. Furthermore, we define a formal description of the many-to-one outcomes of H3 hash functions. We then use this framework to precisely compute the false positive probability for BFs with H3 hash functions which might not be necessarily uniform or independent. We verify our precise false positive expression with a hardware test bed that can execute 2 · 10^8 membership queries per second. We perform structured tests to evaluate the effect of losing uniformity and independence at different levels among the H3 hash functions. Furthermore, we evaluate the results of our expression for a range of parameters.

Suggestions

A NOVEL METHOD FOR DISCRETE COEFFICIENT FIR DIGITAL FILTER DESIGN
Çiloğlu, Tolga (1994-06-02)
A local search algorithm for discrete coefficient FIR filter design is presented. The minmax objective function is minimized by moving along the low gradient directions. A new method to forecast these directions is proposed. The algorithm is suitable to design high order filters in a short time. The results are compared to other methods both in quality and computational load
A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning
Vural, Elif (2018-01-01)
Supervised manifold learning methods learn data representations by preserving the geometric structure of data while enhancing the separation between data samples from different classes. In this work, we propose a theoretical study of supervised manifold learning for classification. We consider nonlinear dimensionality reduction algorithms that yield linearly separable embeddings of training data and present generalization bounds for this type of algorithms. A necessary condition for satisfactory generalizat...
A memetic algorithm for clustering with cluster based feature selection
Şener, İlyas Alper; İyigün, Cem; Department of Operational Research (2022-8)
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of fe...
A Comparative Evaluation of Two Computer SupportedCollaborative Work Systems for Supporting Collaborative BusinessProcess Modeling Activities
Fındık Coşkunçay, Duygu; Çakır, Murat Perit (2018-10-01)
In this study, a comparative evaluation of different Computer-Supported Collaborative Work CSCW environments was conducted to reveal their constraints and affordances for supporting synchronous collaborative business process modeling cBPM activities online. For this purpose, two case studies were carried out with two CSCW systems that differ in terms of their interaction design features for supporting joint work. The dual-eye tracking method was employed to monitor how the participants focused their attenti...
THE GENERALIZED H-LATTICE FILTER AND ESTIMATION OF THE ARMA MODELS
YARMANVURAL, F; WAWRZYNSKI, D; CETIN, AE (1989-01-01)
A class of lattice filters, the generalized H-lattice filters is introduced. The generalized H-lattice filters can represent the autoregressive moving-average (ARMA) models and they can be used for the modeling and analysis of the ARMA stochastic process.
Citation Formats
F. Koltuk, “A novel and precise false positive probability computation for Bloom Filters implemented with universal hash functions,” M.S. - Master of Science, Middle East Technical University, 2022.