All Projects → crystal-community → bloom_filter

crystal-community / bloom_filter

Licence: MIT license
Bloom filter implementation in Crystal lang

Programming Languages

crystal
512 projects
Makefile
30231 projects

Projects that are alternatives of or similar to bloom filter

Sketchy
Sketching Algorithms for Clojure (bloom filter, min-hash, hyper-loglog, count-min sketch)
Stars: ✭ 138 (+318.18%)
Mutual labels:  bloom-filter
ntEdit
✏️ultra fast and scalable genome assembly polishing
Stars: ✭ 56 (+69.7%)
Mutual labels:  bloom-filter
crlite
WebPKI-level Certificate Revocation via Multi-Level Bloom Filter Cascade
Stars: ✭ 52 (+57.58%)
Mutual labels:  bloom-filter
Python Hashes
Interesting (non-cryptographic) hashes implemented in pure Python.
Stars: ✭ 213 (+545.45%)
Mutual labels:  bloom-filter
phpRebloom
🎛️ Use RedisBloom in PHP!
Stars: ✭ 20 (-39.39%)
Mutual labels:  bloom-filter
blex
Fast Bloom filter with concurrent accessibility, powered by :atomics module.
Stars: ✭ 34 (+3.03%)
Mutual labels:  bloom-filter
Basalt
高性能的分布式的专门空间优化的 Bitmap 服务, 高效检查数据是否存在,日活统计,签到,打点等等
Stars: ✭ 128 (+287.88%)
Mutual labels:  bloom-filter
hackernews-button
Privacy-preserving Firefox extension linking to Hacker News discussion; built with Bloom filters and WebAssembly
Stars: ✭ 73 (+121.21%)
Mutual labels:  bloom-filter
fever
fast, extensible, versatile event router for Suricata's EVE-JSON format
Stars: ✭ 47 (+42.42%)
Mutual labels:  bloom-filter
exor filter
Erlang nif for xor_filter. 'Faster and Smaller Than Bloom and Cuckoo Filters'.
Stars: ✭ 29 (-12.12%)
Mutual labels:  bloom-filter
Libbloom
A simple and small bloom filter implementation in plain C.
Stars: ✭ 215 (+551.52%)
Mutual labels:  bloom-filter
raptor
A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences.
Stars: ✭ 37 (+12.12%)
Mutual labels:  bloom-filter
Doramon
个人工具汇总:一致性哈希工具,Bitmap工具,布隆过滤器参数生成器,Yaml和properties互转工具,一键式生成整个前后端工具,单机高性能幂等工具,zookeeper客户端工具,分布式全局id生成器,时间转换工具,Http封装工具
Stars: ✭ 53 (+60.61%)
Mutual labels:  bloom-filter
Redis Cuckoofilter
Hashing-function agnostic Cuckoo filters for Redis
Stars: ✭ 158 (+378.79%)
Mutual labels:  bloom-filter
cuckoo filter
High-performance, concurrent, and mutable Cuckoo Filter for Erlang and Elixir
Stars: ✭ 31 (-6.06%)
Mutual labels:  bloom-filter
Cuckoo Filter
Cuckoo Filter go implement, better than Bloom Filter, configurable and space optimized 布谷鸟过滤器的Go实现,优于布隆过滤器,可以定制化过滤器参数,并进行了空间优化
Stars: ✭ 129 (+290.91%)
Mutual labels:  bloom-filter
bloom
Probabilistic set data structure
Stars: ✭ 77 (+133.33%)
Mutual labels:  bloom-filter
bloomfilter
Bloom filters for Java
Stars: ✭ 53 (+60.61%)
Mutual labels:  bloom-filter
libfilter
High-speed Bloom filters and taffy filters for C, C++, and Java
Stars: ✭ 23 (-30.3%)
Mutual labels:  bloom-filter
redisbloom-go
Go Client for RedisBloom probabilistic module
Stars: ✭ 74 (+124.24%)
Mutual labels:  bloom-filter

Bloom Filter Build Status

Implementation of Bloom Filter in Crystal lang.

Installation

Add this to your application's shard.yml:

dependencies:
  bloom_filter:
    github: crystal-community/bloom_filter

Usage

Basic

require "bloom_filter"

# Create filter with bitmap size of 32 bytes and 3 hash functions.
filter = BloomFilter.new(bytesize = 32, hash_num = 3)

# Insert elements
filter.insert("Esperanto")
filter.insert("Toki Pona")

# Check elements presence
filter.has?("Esperanto")  # => true
filter.has?("Toki Pona")  # => true
filter.has?("Englsh")     # => false

Creating a filter with optimal parameters

Based on your needs(expected number of items and desired probability of false positives), your can create an optimal bloom filter:

# Create a filter, that with one million inserted items, gives 2% of false positives for #has? method
filter = BloomFilter.new_optimal(1_000_000, 0.02)
filter.bytesize # => 1017796 (993Kb)
filter.hash_num # => 6

Dumping into a file and loading

It's possible to save existing bloom filter as a binary file and then load it back.

filter = BloomFilter.new_optimal(2, 0.01)
filter.insert("Esperanto")
filter.dump_file("/tmp/bloom_languages")

loaded_filter = BloomFilter.load_file("/tmp/bloom_languages")
loaded_filter.has?("Esperanto") # => true
loaded_filter.has?("English")   # => false

Union and intersection

Having two filters of the same size and number of hash functions, it's possible to perform union and intersection operations:

f1 = BloomFilter.new(32, 3)
f1.insert("Esperanto")
f1.insert("Spanish")

f2 = BloomFilter.new(32, 3)
f2.insert("Esperanto")
f2.insert("English")

# Union
f3 = f1 | f2
f3.has?("Esperanto") # => true
f3.has?("Spanish")   # => true
f3.has?("English")   # => true

# Intersection
f4 = f1 & f2
f4.has?("Esperanto") # => true
f4.has?("Spanish")   # => false
f4.has?("English")   # => false

Visualization

If you want to see how your filter looks like, you can visualize it:

f1 = BloomFilter.new(16, 2)
f1.insert("Esperanto")
puts "f1 = (Esperanto)"
puts f1.visualize

f2 = BloomFilter.new(16, 2)
f2.insert("Spanish")
puts "f2 = (Spanish)"
puts f2.visualize

f3 = f1 | f2
puts "f3 = f1 | f2 = (Esperanto, Spanish)"
puts f3.visualize

Output:

f1 = (Esperanto)
░░░░░░░░ ░░░░░░█░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░
░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░█ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░

f2 = (Spanish)
░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░
░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░█░ ░█░░░░░░

f3 = f1 | f2 = (Esperanto, Spanish)
░░░░░░░░ ░░░░░░█░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░
░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░█ ░░░░░░░░ ░░░░░░█░ ░█░░░░░░

In this way, you can actually see which bits are set:)

Benchmark

Performance of Bloom filter depends on the following parameters:

  • Size of the filter
  • Number of hash functions
  • Length of the input string

To run benchmark from ./samples/benchmark.cr, simply run make task:

$ make benchmark

Number of items: 100000000
Filter size: 117005Kb
Hash functions: 7
String size: 13

                     user     system      total        real
insert           0.004227   0.000000   0.004227 (  2.769349)
has? (present)   0.007980   0.000000   0.007980 (  5.223778)
has? (missing)   0.004318   0.000000   0.004318 (  2.829521)

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].