All Projects → farhadi → cuckoo_filter

farhadi / cuckoo_filter

Licence: Apache-2.0 license
High-performance, concurrent, and mutable Cuckoo Filter for Erlang and Elixir

Programming Languages

erlang
1774 projects

Projects that are alternatives of or similar to cuckoo filter

guava-probably
Probabilistic data structures for Guava.
Stars: ✭ 51 (+64.52%)
Mutual labels:  bloom-filter, cuckoo-filter
phpRebloom
🎛️ Use RedisBloom in PHP!
Stars: ✭ 20 (-35.48%)
Mutual labels:  bloom-filter, cuckoo-filter
Data Structures
Data-Structures using C++.
Stars: ✭ 121 (+290.32%)
Mutual labels:  bloom-filter
blex
Fast Bloom filter with concurrent accessibility, powered by :atomics module.
Stars: ✭ 34 (+9.68%)
Mutual labels:  bloom-filter
Abyss
🔬 Assemble large genomes using short reads
Stars: ✭ 219 (+606.45%)
Mutual labels:  bloom-filter
Basalt
高性能的分布式的专门空间优化的 Bitmap 服务, 高效检查数据是否存在,日活统计,签到,打点等等
Stars: ✭ 128 (+312.9%)
Mutual labels:  bloom-filter
Gulden Official
Blockchain as intended
Stars: ✭ 126 (+306.45%)
Mutual labels:  bloom-filter
Jredisbloom
Java Client for RedisBloom probabilistic module
Stars: ✭ 108 (+248.39%)
Mutual labels:  bloom-filter
exor filter
Erlang nif for xor_filter. 'Faster and Smaller Than Bloom and Cuckoo Filters'.
Stars: ✭ 29 (-6.45%)
Mutual labels:  bloom-filter
Libbloom
A simple and small bloom filter implementation in plain C.
Stars: ✭ 215 (+593.55%)
Mutual labels:  bloom-filter
bloom
Probabilistic set data structure
Stars: ✭ 77 (+148.39%)
Mutual labels:  bloom-filter
Python Hashes
Interesting (non-cryptographic) hashes implemented in pure Python.
Stars: ✭ 213 (+587.1%)
Mutual labels:  bloom-filter
Cuckoo Filter
Cuckoo Filter go implement, better than Bloom Filter, configurable and space optimized 布谷鸟过滤器的Go实现,优于布隆过滤器,可以定制化过滤器参数,并进行了空间优化
Stars: ✭ 129 (+316.13%)
Mutual labels:  bloom-filter
fever
fast, extensible, versatile event router for Suricata's EVE-JSON format
Stars: ✭ 47 (+51.61%)
Mutual labels:  bloom-filter
Doramon
个人工具汇总:一致性哈希工具,Bitmap工具,布隆过滤器参数生成器,Yaml和properties互转工具,一键式生成整个前后端工具,单机高性能幂等工具,zookeeper客户端工具,分布式全局id生成器,时间转换工具,Http封装工具
Stars: ✭ 53 (+70.97%)
Mutual labels:  bloom-filter
Tinysearch
🔍 Tiny, full-text search engine for static websites built with Rust and Wasm
Stars: ✭ 1,705 (+5400%)
Mutual labels:  bloom-filter
Redis Cuckoofilter
Hashing-function agnostic Cuckoo filters for Redis
Stars: ✭ 158 (+409.68%)
Mutual labels:  bloom-filter
raptor
A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences.
Stars: ✭ 37 (+19.35%)
Mutual labels:  bloom-filter
crlite
WebPKI-level Certificate Revocation via Multi-Level Bloom Filter Cascade
Stars: ✭ 52 (+67.74%)
Mutual labels:  bloom-filter
redisbloom-go
Go Client for RedisBloom probabilistic module
Stars: ✭ 74 (+138.71%)
Mutual labels:  bloom-filter

Cuckoo Filter

CI build status codecov Hex docs Hex Version License

A high-performance, concurrent, and mutable Cuckoo Filter implemented using atomics for Erlang and Elixir.

Introduction

A Cuckoo Filter is a space-efficient probabilistic data structure for approximated set-membership queries. It can be used to test whether an element is a member of a set in constant time and requires only a few bits per element. The trade-off is that a low rate of false positives is possible, but false negatives are not.

Cuckoo Filter is considered a better alternative to Bloom Filter with lower space overhead and with support for deletion of inserted elements.

Insertion of elements in a cuckoo filter becomes slowers when the load factor is high and might fail when capacity is nearly full.

Implementation

In this implementation, filter data is stored in an atomics array, which is a fixed-size mutable array of 64-bit integers. By using atomics we can have fast and concurrent access to the filter for both reads and writes.

To be able to update fingerprints atomically, this implementation only allows fingerprint sizes of 4, 8, 16, 32, and 64 bits, so that multiple fingerprints can fit in a single 64-bit atomic integer.

In a Cuckoo Filter for each element, there are two buckets where it can be inserted. To insert an element when there is an empty slot available in one of the two buckets, we can atomically update that empty entry, but when there is no slot available in the two buckets we have to relocate exiting elements to make room for the new one. In this case, since multiple entries need to be updated, we no longer can do insert operation atomically, and such inserts can not be done concurrently. In such cases, to prevent race conditions we use the first integer in the atomics array as a mutex.

Deletes are also done with a lock to prevent race conditions when an insert is relocating elements.

When relocating existing elements, it is also possible that a concurrent lookup of an evicted element fails. To prevent this, instead of immediately updating an evicted element, we keep a sequence of evictions in memory, and once we find an empty slot we start applying the changes in the sequence from the end to the beginning. In other words, to relocate an element, we first insert it in its new place, then we remove it from its old place, making sure the element is always available to lookups.

In most cuckoo filter implementations, when an insert fails, the element is actually added to the filter, but some other random element gets removed, but in this implementation, by using this eviction cache technique we can also avoid the removal of a random element when an insertion fails.

Another benefit of keeping the chain of evictions in memory is the early detection of loops before reaching the maximum number of evictions.

The second integer in the array is used as an atomic counter to store the number of elements inserted in the filter.

For hashing, by default 64-bit variant of XXH3 is used.

Fingerprint size, bucket size, hash function, and the maximum number of evictions are configurable when creating a new filter.

Usage

If you want to use the default xxh3 hash functions, you need to add xxh3 to your deps.

In Erlang

Filter = cuckoo_filter:new(1000),
ok = cuckoo_filter:add(Filter, 1),
true = cuckoo_filter:contains(Filter, 1),
false = cuckoo_filter:contains(Filter, 2),
ok = cuckoo_filter:delete(Filter, 1),
{error, not_found} = cuckoo_filter:delete(Filter, 1).

In Elixir

filter = :cuckoo_filter.new(1000)
:ok = :cuckoo_filter.add(filter, 1)
true = :cuckoo_filter.contains(filter, 1)
false = :cuckoo_filter.contains(filter, 2)
:ok = :cuckoo_filter.delete(filter, 1)
{:error, :not_found} = :cuckoo_filter.delete(filter, 1)

For more details, see the module documentation.

License

Copyright 2021, Ali Farhadi [email protected].

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].