All Projects → dac-gmbh → golomb-set

dac-gmbh / golomb-set

Licence: MIT license
A Golomb Coded Set implementation in Rust

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to golomb-set

redisbloom-go
Go Client for RedisBloom probabilistic module
Stars: ✭ 74 (+124.24%)
Mutual labels:  bloom-filter
PharoPDS
Probabilistic data structures in Pharo Smalltalk.
Stars: ✭ 28 (-15.15%)
Mutual labels:  bloom-filter
leaked-password
Leaked password check library with bloom filter
Stars: ✭ 41 (+24.24%)
Mutual labels:  bloom-filter
crlite
WebPKI-level Certificate Revocation via Multi-Level Bloom Filter Cascade
Stars: ✭ 52 (+57.58%)
Mutual labels:  bloom-filter
bloomfilter
Bloom filters for Java
Stars: ✭ 53 (+60.61%)
Mutual labels:  bloom-filter
ganon
ganon classifies short DNA sequences against large sets of genomic sequences efficiently, with download and update of references (RefSeq/Genbank), taxonomic (NCBI/GTDB) and hierarchical classification, customized reporting and more
Stars: ✭ 57 (+72.73%)
Mutual labels:  bloom-filter
blex
Fast Bloom filter with concurrent accessibility, powered by :atomics module.
Stars: ✭ 34 (+3.03%)
Mutual labels:  bloom-filter
komihash
Very fast, high-quality hash function (non-cryptographic, C) + PRNG
Stars: ✭ 68 (+106.06%)
Mutual labels:  bloom-filter
bloom filter
Bloom filter implementation in Crystal lang
Stars: ✭ 33 (+0%)
Mutual labels:  bloom-filter
bloom
An in-memory bloom filter with persistence and HTTP interface
Stars: ✭ 31 (-6.06%)
Mutual labels:  bloom-filter
cuckoo filter
High-performance, concurrent, and mutable Cuckoo Filter for Erlang and Elixir
Stars: ✭ 31 (-6.06%)
Mutual labels:  bloom-filter
hackernews-button
Privacy-preserving Firefox extension linking to Hacker News discussion; built with Bloom filters and WebAssembly
Stars: ✭ 73 (+121.21%)
Mutual labels:  bloom-filter
rust-bloomfilter
🦀 Bloom filter implementation in Rust 🦀
Stars: ✭ 18 (-45.45%)
Mutual labels:  bloom-filter
exor filter
Erlang nif for xor_filter. 'Faster and Smaller Than Bloom and Cuckoo Filters'.
Stars: ✭ 29 (-12.12%)
Mutual labels:  bloom-filter
xorf
Xor filters - efficient probabilistic hashsets. Faster and smaller than bloom and cuckoo filters.
Stars: ✭ 64 (+93.94%)
Mutual labels:  bloom-filter
Doramon
个人工具汇总:一致性哈希工具,Bitmap工具,布隆过滤器参数生成器,Yaml和properties互转工具,一键式生成整个前后端工具,单机高性能幂等工具,zookeeper客户端工具,分布式全局id生成器,时间转换工具,Http封装工具
Stars: ✭ 53 (+60.61%)
Mutual labels:  bloom-filter
pybloomfiltermmap3
Fast Python Bloom Filter using Mmap
Stars: ✭ 87 (+163.64%)
Mutual labels:  bloom-filter
json-bloomfilter
🗜 A bloom filter implementation in Ruby and Javascript that is serialisable to JSON and compatible between both languages.
Stars: ✭ 15 (-54.55%)
Mutual labels:  bloom-filter
bloomfilter
Simplistic (but fast) java implementation of a bloom filter.
Stars: ✭ 35 (+6.06%)
Mutual labels:  bloom-filter
bloomclj
A Bloom Filter implementation in Clojure
Stars: ✭ 20 (-39.39%)
Mutual labels:  bloom-filter

golomb-set

A Golomb Coded Set implementation in Rust

crates.io docs.rs Build Status Dependabot Status

A Golomb Coded Set is a probabilistic data structure which is typically smaller than a bloom filter, at the cost of query performance.. A sorted list of differences between samples of a random distribution will roughly have a geometric distribution, which makes for an optimal prefix for Golomb-Rice coding. Since a good hash algorithm will be randomly distributed, this encoding makes for efficient storage of hashed values.

Giovanni Bajo's blog post as well as their Python and C++ implementations were a huge help when writing and testing this library, found here and here respectively.

Usage and Behaviour

There are 3 main parameters to select when creating a Golomb Coded Set: the hash algorithm, N and P. N is the desired maximum number of elements that will be inserted into the set, and 1 / 2 ^ P is the desired probability of a false positive when the set is full. If fewer items have been inserted the real probability will be significantly lower.

The chosen hashing algorithm must have a uniform distribution (which is not the same as being cryptograpically secure) and the output length of the hash in bits must be greater than log2(N * 2 ^ P) bits. This is not currently enforced by the library and failing to do so could result in far more false positives than expected. Beyond meeting those requirements, selecting an algorithm for speed would be appropriate. If the hardware acceleration is present, CRC32 would be a good choice for up to a million elements and a false positive probability of 0.001%. For larger sets and/or lower probabilities a hashing algorithm with a longer output is needed.

Example

use {golomb_set::UnpackedGcs, md5::Md5};

// Create a GCS where when 3 items are stored in the set, the
// probability of a false positive will be `1/(2^5)`, or 3.1%
let mut gcs = UnpackedGcs::<Md5>::new(3, 5);

// Insert the MD5 hashes of "alpha" and "bravo"
gcs.insert(b"alpha");
gcs.insert(b"bravo");

assert!(gcs.contains(b"alpha"));
assert!(gcs.contains(b"bravo"));
assert!(!gcs.contains(b"charlie"));

// Reduces memory footprint in exchange for slower querying
let gcs = gcs.pack();

assert!(gcs.contains(b"alpha"));
assert!(gcs.contains(b"bravo"));
assert!(!gcs.contains(b"charlie"));
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].