All Projects → ayazhafiz → xorf

ayazhafiz / xorf

Licence: MIT license
Xor filters - efficient probabilistic hashsets. Faster and smaller than bloom and cuckoo filters.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to xorf

exor filter
Erlang nif for xor_filter. 'Faster and Smaller Than Bloom and Cuckoo Filters'.
Stars: ✭ 29 (-54.69%)
Mutual labels:  bloom-filter, xor-filter
Ceramist
Verified hash-based AMQ structures in Coq
Stars: ✭ 107 (+67.19%)
Mutual labels:  probability, bloom-filter
hackernews-button
Privacy-preserving Firefox extension linking to Hacker News discussion; built with Bloom filters and WebAssembly
Stars: ✭ 73 (+14.06%)
Mutual labels:  bloom-filter
discrete-math-python-scripts
Python code snippets from Discrete Mathematics for Computer Science specialization at Coursera
Stars: ✭ 98 (+53.13%)
Mutual labels:  probability
Undergraduate-in-Statistics
Using Computer with your Statistics Major Course
Stars: ✭ 57 (-10.94%)
Mutual labels:  probability
bloom filter
Bloom filter implementation in Crystal lang
Stars: ✭ 33 (-48.44%)
Mutual labels:  bloom-filter
alea
Coq library for reasoning on randomized algorithms [maintainers=@anton-trunov,@volodeyka]
Stars: ✭ 20 (-68.75%)
Mutual labels:  probability
actuar
Actuarial functions and heavy tailed distributions for R
Stars: ✭ 19 (-70.31%)
Mutual labels:  probability
leaked-password
Leaked password check library with bloom filter
Stars: ✭ 41 (-35.94%)
Mutual labels:  bloom-filter
shoki
Purely functional data structures in Java
Stars: ✭ 30 (-53.12%)
Mutual labels:  hashset
penney
Penney's Game
Stars: ✭ 14 (-78.12%)
Mutual labels:  probability
appliedstats
📊 Methods of Applied Statistics Course Textbook Repository
Stars: ✭ 134 (+109.38%)
Mutual labels:  probability
PharoPDS
Probabilistic data structures in Pharo Smalltalk.
Stars: ✭ 28 (-56.25%)
Mutual labels:  bloom-filter
rust-bloomfilter
🦀 Bloom filter implementation in Rust 🦀
Stars: ✭ 18 (-71.87%)
Mutual labels:  bloom-filter
bloomfilter
Bloom filters for Java
Stars: ✭ 53 (-17.19%)
Mutual labels:  bloom-filter
bloom
An in-memory bloom filter with persistence and HTTP interface
Stars: ✭ 31 (-51.56%)
Mutual labels:  bloom-filter
libfilter
High-speed Bloom filters and taffy filters for C, C++, and Java
Stars: ✭ 23 (-64.06%)
Mutual labels:  bloom-filter
mchmm
Markov Chains and Hidden Markov Models in Python
Stars: ✭ 89 (+39.06%)
Mutual labels:  probability
ganon
ganon classifies short DNA sequences against large sets of genomic sequences efficiently, with download and update of references (RefSeq/Genbank), taxonomic (NCBI/GTDB) and hierarchical classification, customized reporting and more
Stars: ✭ 57 (-10.94%)
Mutual labels:  bloom-filter
ml-compiled
Quick definitions and intuitive explanations around machine learning.
Stars: ✭ 29 (-54.69%)
Mutual labels:  probability

xorf

Xorf docs Crates.io Build Status

This repository hosts a Rust library implementing xor filters and their derivates:

Xor filters are data structures for fast approximation of set membership using little memory. Probabilistic filters like xor filters are useful when it's okay to have false positives sometimes, but it's important to be space and time efficient. In other words, they trade off accuracy for efficiency as compared to general-purpose hashsets. Filters like xor filter are often used in conjunction with larger hash-based data structures, with the filter doing a "first pass" of the work to avoid using a more expensive resource unnecessarily. For example, filters like xor filters can be used to reduce disk writes in a cache or identify malicious URLs in a browser.

Xor filters are faster and smaller than Bloom and Cuckoo filters. Xor filters incur a relative time penalty in construction, but are very fast in lookups; the expectation is that construction of a filter is amortized after many queries. Daniel Lemire's go implementation provides a useful summary of xor filters' benefits and listing of other xor filter libraries.

This library is no_std and needs_allocator.

xorf also provides a HashProxy for using Xor filters with arbitrary key types.

Installation

Add a dependency to xorf in Cargo.toml:

[dependencies]
xorf = "M.m.p" # use a desired version

Available versions are listed on crates and the in repository's releases.

Usage

Please see the library documentation for usage information.

Features

Custom allocator

To use a custom global allocator, you must be using a nightly release of rustc and have enabled the nightly feature for xorf.

[dependencies]
xorf = { version = "M.m.p", features = ["nightly"] }

This will tag the crate as needs_allocator, which you will then have to provide. At this time, a custom allocator is used globally.

Serialization/Deserialization

Serialization and deserialization with serde cab be enabled with the serde feature.

[dependencies]
xorf = { version = "M.m.p", features = ["serde"] }

Default features

Uniform Random

By default, xorf uses the uniform-random feature, which uses random values for unused fingerprint entries rather than initializing them to zero. This provides a slightly lower false-positive rate, but incurs a higher initialization cost. The cost of lookups is not affected.

The distribution of zero-valued fingerprints for a 1,000,000-element BinaryFuse8 filter, both without and with the uniform-random feature, is visualized below. Notice that the scales of the y-axes differ.

BinaryFuse8 without uniform-random BinaryFuse8 with uniform-random
Ouch Nice

To disable the uniform-random feature, specify that default features should be disabled:

[dependencies]
xorf = { version = "M.m.p", default-features = false }
Binary Fuse

By default, xorf uses the binary-fuse feature, which adds support for and exposes Binary Fuse filter implementations. This feature pulls in a dependency of libm, but has no runtime cost. This feature is highly recommended, as Binary Fuse filters are the most powerful in the Xor filter family.

Development

Development of xorf targets the master branch of this repository.

Changes can be tested by running the check script:

scripts/check lf     # validates lint and format
scripts/check test   # tests source code

Contribution

Contributions are warmly welcomed. No contribution is too small, and all are appreciated.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].