All Projects → linvon → Cuckoo Filter

linvon / Cuckoo Filter

Licence: mit
Cuckoo Filter go implement, better than Bloom Filter, configurable and space optimized 布谷鸟过滤器的Go实现,优于布隆过滤器,可以定制化过滤器参数,并进行了空间优化

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Cuckoo Filter

Redisbloom
Probabilistic Datatypes Module for Redis
Stars: ✭ 858 (+565.12%)
Mutual labels:  bloom-filter
Sketch
C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings
Stars: ✭ 96 (-25.58%)
Mutual labels:  bloom-filter
Jredisbloom
Java Client for RedisBloom probabilistic module
Stars: ✭ 108 (-16.28%)
Mutual labels:  bloom-filter
Springmvc Project
开箱即用的SpringMVC项目,包含常规业务所需的框架功能整合,更多功能请关注 https://github.com/MartinDai/SpringBoot-Project
Stars: ✭ 33 (-74.42%)
Mutual labels:  bloom-filter
Algorithms Study Group
Study group for algorithms in Ruby, hosted at App Academy
Stars: ✭ 94 (-27.13%)
Mutual labels:  bloom-filter
Bloom Filters
JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash
Stars: ✭ 99 (-23.26%)
Mutual labels:  bloom-filter
Cfilter
Cuckoo Filter implementation in Go, better than Bloom Filters (unmaintained)
Stars: ✭ 772 (+498.45%)
Mutual labels:  bloom-filter
Gulden Official
Blockchain as intended
Stars: ✭ 126 (-2.33%)
Mutual labels:  bloom-filter
Boomfilters
Probabilistic data structures for processing continuous, unbounded streams.
Stars: ✭ 1,333 (+933.33%)
Mutual labels:  bloom-filter
Minperf
A Minimal Perfect Hash Function Library
Stars: ✭ 107 (-17.05%)
Mutual labels:  bloom-filter
App comments spider
爬取百度贴吧、TapTap、appstore、微博官方博主上的游戏评论(基于redis_scrapy),过滤器采用了bloomfilter。
Stars: ✭ 38 (-70.54%)
Mutual labels:  bloom-filter
Bloomex
🌺 A pure Elixir implementation of Scalable Bloom Filters
Stars: ✭ 93 (-27.91%)
Mutual labels:  bloom-filter
Buckets Swift
Swift Collection Data Structures Library
Stars: ✭ 106 (-17.83%)
Mutual labels:  bloom-filter
Gopie
go patterns
Stars: ✭ 28 (-78.29%)
Mutual labels:  bloom-filter
Tinysearch
🔍 Tiny, full-text search engine for static websites built with Rust and Wasm
Stars: ✭ 1,705 (+1221.71%)
Mutual labels:  bloom-filter
Doramon
常见工具汇总:一键式生成整个前后端工具,单机高性能幂等工具,zookeeper客户端工具,分布式全局id生成器,一致性哈希工具,Bitmap工具,布隆过滤器参数生成器,Yaml和properties互转工具等等
Stars: ✭ 24 (-81.4%)
Mutual labels:  bloom-filter
Proof Of Work
Proof of Work with SHA256 and Bloom filter
Stars: ✭ 97 (-24.81%)
Mutual labels:  bloom-filter
Basalt
高性能的分布式的专门空间优化的 Bitmap 服务, 高效检查数据是否存在,日活统计,签到,打点等等
Stars: ✭ 128 (-0.78%)
Mutual labels:  bloom-filter
Data Structures
Data-Structures using C++.
Stars: ✭ 121 (-6.2%)
Mutual labels:  bloom-filter
Ceramist
Verified hash-based AMQ structures in Coq
Stars: ✭ 107 (-17.05%)
Mutual labels:  bloom-filter

cuckoo-filter

Mentioned in Awesome Go

cuckoo-filter go implement. Config by you

transplant from efficient/cuckoofilter

中文文档

Overview

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please use:

"Cuckoo Filter: Practically Better Than Bloom" in proceedings of ACM CoNEXT 2014 by Bin Fan, Dave Andersen and Michael Kaminsky

Implementation details

The paper cited above leaves several parameters to choose.

  1. Bucket size(b): Number of fingerprints per bucket
  2. Fingerprints size(f): Fingerprints bits size of hashtag

In other implementation:

In this implementation, you can adjust b and f to any value you want, and the Semi-sorting Buckets mentioned in paper is also available, which can save 1 bit per item.

Why custom is important?

According to paper

  • Different bucket size result in different filter loadfactor, which means occupancy rate of filter
  • Different bucket size is suitable for different target false positive rate
  • To keep a false positive rate, bigger bucket size, bigger fingerprint size

Given a target false positive rate of r

when r > 0.002, having two entries per bucket yields slightly better results than using four entries per bucket; when decreases to 0.00001 < r ≤ 0.002, four entries per bucket minimizes space.

with a bucket size b, they suggest choosing the fingerprint size f using

f >= log2(2b/r) bits

as the same time, notice that we got loadfactor 84%, 95% or 98% when using bucket size b = 2, 4 or 8

To know more about parameter choosing, refer to paper's section 5

Note: generally b = 8 is enough, without more data support, we suggest you choosing b from 2, 4 or 8. And f is max 32 bits

Example usage:

package main

import (
	"fmt"
	"github.com/linvon/cuckoo-filter"
)

func main() {
	cf := cuckoo.NewFilter(4, 9, 3900, cuckoo.TableTypePacked)
	fmt.Println(cf.Info())
	fmt.Println(cf.FalsePositiveRate())

	a := []byte("A")
	cf.Add(a)
	fmt.Println(cf.Contain(a))
	fmt.Println(cf.Size())

	b := cf.Encode()
	ncf, _ := cuckoo.Decode(b)
	fmt.Println(ncf.Contain(a))

	cf.Delete(a)
	fmt.Println(cf.Size())
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].