All Projects → bits-and-blooms → bloom

bits-and-blooms / bloom

Licence: BSD-2-Clause license
Go package implementing Bloom filters

Programming Languages

go
31211 projects - #10 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to bloom

bloom
An in-memory bloom filter with persistence and HTTP interface
Stars: ✭ 31 (-98.23%)
Mutual labels:  bloom, bloom-filters
js-data-structures
🌿 Data structures for JavaScript
Stars: ✭ 56 (-96.8%)
Mutual labels:  bloom-filters
blex
Fast Bloom filter with concurrent accessibility, powered by :atomics module.
Stars: ✭ 34 (-98.06%)
Mutual labels:  bloom
bloom
Bloom filter for go, backed by redis
Stars: ✭ 37 (-97.89%)
Mutual labels:  bloom-filters
limitless-engine
OpenGL C++ Graphics Engine
Stars: ✭ 95 (-94.58%)
Mutual labels:  bloom
bloom-legacy
End-to-end encrypted Notes, Files, Calendar, Contacts... for Android, IOS, Linux & MacOS - DEPRECATED
Stars: ✭ 44 (-97.49%)
Mutual labels:  bloom
GITechDemo
Global illumination technical demo - a continuation of the Synesthesia3D (ex-LibRenderer) graphics engine used in https://github.com/iftodebogdan/ShaderEditor
Stars: ✭ 45 (-97.43%)
Mutual labels:  bloom
BeatSaber Tweaks55
A collection of various tweaks which by themselves are too simple for their own designated mods
Stars: ✭ 26 (-98.52%)
Mutual labels:  bloom
bloofi
Bloofi: A java implementation of multidimensional Bloom filters
Stars: ✭ 68 (-96.12%)
Mutual labels:  bloom-filters
Olivia
Go: A distributed, in-memory key-value storage.
Stars: ✭ 94 (-94.63%)
Mutual labels:  bloom-filters
bloomfilter
Bloom filters for Java
Stars: ✭ 53 (-96.97%)
Mutual labels:  bloom-filters
hackernews-button
Privacy-preserving Firefox extension linking to Hacker News discussion; built with Bloom filters and WebAssembly
Stars: ✭ 73 (-95.83%)
Mutual labels:  bloom-filters
libfilter
High-speed Bloom filters and taffy filters for C, C++, and Java
Stars: ✭ 23 (-98.69%)
Mutual labels:  bloom-filters
bloomfilter
bloomfilter.js ported to Go
Stars: ✭ 94 (-94.63%)
Mutual labels:  bloom-filters

Bloom filters

Test Go Report Card Go Reference

A Bloom filter is a concise/compressed representation of a set, where the main requirement is to make membership queries; i.e., whether an item is a member of a set. A Bloom filter will always correctly report the presence of an element in the set when the element is indeed present. A Bloom filter can use much less storage than the original set, but it allows for some 'false positives': it may sometimes report that an element is in the set whereas it is not.

When you construct, you need to know how many elements you have (the desired capacity), and what is the desired false positive rate you are willing to tolerate. A common false-positive rate is 1%. The lower the false-positive rate, the more memory you are going to require. Similarly, the higher the capacity, the more memory you will use. You may construct the Bloom filter capable of receiving 1 million elements with a false-positive rate of 1% in the following manner.

    filter := bloom.NewWithEstimates(1000000, 0.01) 

You should call NewWithEstimates conservatively: if you specify a number of elements that it is too small, the false-positive bound might be exceeded. A Bloom filter is not a dynamic data structure: you must know ahead of time what your desired capacity is.

Our implementation accepts keys for setting and testing as []byte. Thus, to add a string item, "Love":

    filter.Add([]byte("Love"))

Similarly, to test if "Love" is in bloom:

    if filter.Test([]byte("Love"))

For numerical data, we recommend that you look into the encoding/binary library. But, for example, to add a uint32 to the filter:

    i := uint32(100)
    n1 := make([]byte, 4)
    binary.BigEndian.PutUint32(n1, i)
    filter.Add(n1)

Godoc documentation: https://pkg.go.dev/github.com/bits-and-blooms/bloom/v3

Installation

go get -u github.com/bits-and-blooms/bloom/v3

Verifying the False Positive Rate

Sometimes, the actual false positive rate may differ (slightly) from the theoretical false positive rate. We have a function to estimate the false positive rate of a Bloom filter with m bits and k hashing functions for a set of size n:

    if bloom.EstimateFalsePositiveRate(20*n, 5, n) > 0.001 ...

You can use it to validate the computed m, k parameters:

    m, k := bloom.EstimateParameters(n, fp)
    ActualfpRate := bloom.EstimateFalsePositiveRate(m, k, n)

or

    f := bloom.NewWithEstimates(n, fp)
    ActualfpRate := bloom.EstimateFalsePositiveRate(f.m, f.k, n)

You would expect ActualfpRate to be close to the desired false-positive rate fp in these cases.

The EstimateFalsePositiveRate function creates a temporary Bloom filter. It is also relatively expensive and only meant for validation.

Contributing

If you wish to contribute to this project, please branch and issue a pull request against master ("GitHub Flow")

This project includes a Makefile that allows you to test and build the project with simple commands. To see all available options:

make help

Running all tests

Before committing the code, please check if it passes all tests using (note: this will install some dependencies):

make deps
make qa

Design

A Bloom filter has two parameters: m, the number of bits used in storage, and k, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo m). Set membership is done by testing whether the bits at each value of the hashing functions (again, modulo m) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose k and m correctly.

In this implementation, the hashing functions used is murmurhash, a non-cryptographic hashing function.

Given the particular hashing scheme, it's best to be empirical about this. Note that estimating the FP rate will clear the Bloom filter.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].