All Projects → mpope9 → exor_filter

mpope9 / exor_filter

Licence: Apache-2.0 license
Erlang nif for xor_filter. 'Faster and Smaller Than Bloom and Cuckoo Filters'.

Programming Languages

c
50402 projects - #5 most used programming language
erlang
1774 projects
Makefile
30231 projects

Projects that are alternatives of or similar to exor filter

rust-bloomfilter
🦀 Bloom filter implementation in Rust 🦀
Stars: ✭ 18 (-37.93%)
Mutual labels:  bloom-filter, probabilistic-data-structures
bloomfilter
Bloomfilter written in Golang, includes rotation and RPC
Stars: ✭ 61 (+110.34%)
Mutual labels:  bloom-filter, probabilistic-data-structures
bloom
An in-memory bloom filter with persistence and HTTP interface
Stars: ✭ 31 (+6.9%)
Mutual labels:  bloom-filter, probabilistic-data-structures
gerlang
Calling Go functions from Erlang
Stars: ✭ 26 (-10.34%)
Mutual labels:  nif, erlang-nif
xorf
Xor filters - efficient probabilistic hashsets. Faster and smaller than bloom and cuckoo filters.
Stars: ✭ 64 (+120.69%)
Mutual labels:  bloom-filter, xor-filter
blex
Fast Bloom filter with concurrent accessibility, powered by :atomics module.
Stars: ✭ 34 (+17.24%)
Mutual labels:  bloom-filter, probabilistic-data-structures
webview
Build cross platform desktop apps with Elixir and web technologies.
Stars: ✭ 18 (-37.93%)
Mutual labels:  nif
phpRebloom
🎛️ Use RedisBloom in PHP!
Stars: ✭ 20 (-31.03%)
Mutual labels:  bloom-filter
Libbloom
A simple and small bloom filter implementation in plain C.
Stars: ✭ 215 (+641.38%)
Mutual labels:  bloom-filter
Redis Cuckoofilter
Hashing-function agnostic Cuckoo filters for Redis
Stars: ✭ 158 (+444.83%)
Mutual labels:  bloom-filter
Doramon
个人工具汇总:一致性哈希工具,Bitmap工具,布隆过滤器参数生成器,Yaml和properties互转工具,一键式生成整个前后端工具,单机高性能幂等工具,zookeeper客户端工具,分布式全局id生成器,时间转换工具,Http封装工具
Stars: ✭ 53 (+82.76%)
Mutual labels:  bloom-filter
better-dni
The fastest Spanish DNI (NIE / NIF) validation out there.
Stars: ✭ 23 (-20.69%)
Mutual labels:  nif
raptor
A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences.
Stars: ✭ 37 (+27.59%)
Mutual labels:  bloom-filter
UIX-R
Ultima IX: Redemption for Morrowind and OpenMW
Stars: ✭ 32 (+10.34%)
Mutual labels:  nif
fever
fast, extensible, versatile event router for Suricata's EVE-JSON format
Stars: ✭ 47 (+62.07%)
Mutual labels:  bloom-filter
Abyss
🔬 Assemble large genomes using short reads
Stars: ✭ 219 (+655.17%)
Mutual labels:  bloom-filter
libsalty
Elixir bindings for libsodium (NIF)
Stars: ✭ 20 (-31.03%)
Mutual labels:  erlang-nif
Python Hashes
Interesting (non-cryptographic) hashes implemented in pure Python.
Stars: ✭ 213 (+634.48%)
Mutual labels:  bloom-filter
Rustler
Safe Rust bridge for creating Erlang NIF functions
Stars: ✭ 3,052 (+10424.14%)
Mutual labels:  nif
bloom
Probabilistic set data structure
Stars: ✭ 77 (+165.52%)
Mutual labels:  bloom-filter

exor_filter

Nif wrapper for the xor_filter: https://github.com/FastFilter/xor_singleheader

They're 'Faster and Smaller Than Bloom and Cuckoo Filters'.

This library uses dirty nifs for initializing filters over 10K elements! Make sure your environment is setup correctly. Filters of 10M elements can be initialized within 4 seconds. Within 2.5 seconds if the library is used unsafely.

Table of Contents

Benchmarks

The exor_benchmark repo was used to compare access times to popular bloom filter libraries.

Benchmark Graph

Installation

Available on hex.pm!.

For rebar3:

%% rebar.config

{deps, [
  {exor_filter, "0.8.2"}
]}.

For Mix:

## mix.exs

defp deps do
  [
    {:exor_filter, "~> 0.8.2"}
  ]
end

Note, if you're using Erlang below version 23, then use this version of this library: v0.5.2. Otherwise, use the latest version.

Example Usage

Basic Usage

Basic usage with default hashing is as follows:

Filter = xor8:new(["cat", "dog", "mouse"]),
true   = xor8:contain(Filter, "cat"),
false  = xor8:contain(Filter, "goose").

Filters are initialized independently:

Filter1 = xor8:new([1, 2, 3]),
Filter2 = xor8:new([4, 5, 6]),

false   = xor8:contain(Filter1, 6),
true    = xor8:contain(Filter1, 2),

false   = xor8:contain(Filter2, 2),
true    = xor8:contain(Filter2, 5).

Incremental Initialization

This is now the preferred method of usage. To create a filter incrementally, the following API should be used. It is more memory efficient than providing the entire list at initialization time. Only the default hashing method is supported. See the hashing section for more details. This method will automatically deduplicate the input safely. WARNING: Currently, the incremental API does not use dirty nifs for large input sizes. Be cautious of this, initialization can block.

Filter0 = xor8:new_empty(),            %% new_empty/0 defaults to 64 elements.  Either function
                                       %% will dynamically allocate more space as 
                                       %% needed while elements are added.
Filter1 = xor8:add(Filter0, [1, 2]),
Filter2 = xor8:add(Filter1, [3, 4]),   %% More space allocated here.
Filter3 = xor8:finalize(Filter3),      %% finalize/1 MUST be called to actually intialize the filter.
true    = xor8:contain(Filter3, 1),
false   = xor8:contain(Filter3, 6).

Do not modify the return value of any of the functions. The other APIs will not function correctly.

Hashing

  • The function xor8:new/1 uses the default hash algorithm.
  • To specify the hashing algorithm to use, use the xor8:new/2 function.
  • The filter initialization functions return values contain the context of hashing, so there is no need to specify it in the xor8:contain/2 function.
    • Do not pre-hash the value being passed to xor8:contain/2 or /3. Pass the raw value!
    • (Unless you've explicitly set that you're using pre-hashed data. See below).
  • The default hashing mechanisms remove duplicate keys. Pre-hashed data should be checked by the user. The libary will return an error on initialization if dupes are detected.

Hashing Example

Filter = xor8:new([1, 2, 3], none),
true   = xor8:contain(Filter, 1),
false  = xor8:contain(Filter, 6).

Hashing API

  • The default hash function used is erlang:phash2/1
    • It can be specified with the default_hash as the second argument to xor8:new/2.
    • It uses 60 bits on a 64-bit system and is consistent across nodes.
    • The default hashing function should be fine for most use cases, but if the filter has over 20K elements, create your own hashing function, as hashing collisions will become more frequent.
      • Errors won't happen if a collision occurs.

Pre-Hashing and Custom Hashing

  • There is an option to pass a hash function during intialization.
  • It must return a unsigned 64 bit number and have an airty of /1.
  • Due to the Erlang nif api lacking the functionality to pass and call a function in a nif, this method creates a second list of equal length. Be weary of that.
  • The custom hashing function must return unique keys.
    • An error will be returned otherwise.
    • Make your unit testing reflect reality, if possible. This will catch the issue early.
Fun    = fun(X) -> X + 1 end,
Filter = xor8:new([1, 2, 3], Fun),
true   = xor8:contain(Filter, 4),
false  = xor8:contain(Filter, 1).
  • To pass pre-hashed data, use the hash option none. The xor8:contain/2 and /3 functions must be passed pre-hashed data in this case.
    • This too will check for duplicate hashed values, and will return an error if it is detected.

Elixir Example

# ...
alias :xor8, as: Xor8
# ...
true =
   [1, 2, 3, 4]
   |> Xor8.new()
   |> Xor8.contain(1)

Custom Return Values

contain/3 can return a custom value instead of false if the value isn't present in the filter:

Filter1            = xor8:new(["Ricky Bobby", "Cal Naughton Jr."]),
true               = xor8:contain(Filter1, "Ricky Bobby", {error, not_found}),
{error, not_found} = xor8:contain(Filter1, "Reese Bobby", {error, not_found}).

Serialization

Functions are provided to the filter in binary form, instead of a nif reference. This can be useful to interop with other platforms / systems. The bin returned can be used with contain for ease of use. Example usage:

Filter                        = xor8:new(["test1", "test2", "test3"]),
BinFilter                     = xor8:to_bin(Filter),
{XorFilterBin, _HashFunction} = BinFilter,
true                          = xor8:contain(BinFilter, "test1").

xor16

The usage of the xor16 is the same. That structure is larger, but has a smaller false positive rate. Just sub xor8 for xor16 in all of the examples.

Buffered Initialization

The buffered versions of initialize are provided for larger data sets. This can be faster. See xor8:new_buffered/2 for more information.

You didn't hear it from me, though ;)

Build

$ rebar3 compile

Test

$ rebar3 eunit
$ rebar3 cover

Docs

$ rebar3 edoc

Implementations of xor filters in other languages

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].