All Projects → alexandrnikitin → Bloom Filter Scala

alexandrnikitin / Bloom Filter Scala

Licence: mit
Bloom filter for Scala, the fastest for JVM

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Bloom Filter Scala

bloomfilter
Simplistic (but fast) java implementation of a bloom filter.
Stars: ✭ 35 (-89.49%)
Mutual labels:  datastructures, bloom-filter
Atomic queue
C++ lockless queue.
Stars: ✭ 373 (+12.01%)
Mutual labels:  datastructures, high-performance
Clojurecl
ClojureCL is a Clojure library for parallel computations with OpenCL.
Stars: ✭ 266 (-20.12%)
Mutual labels:  high-performance
Saea
SAEA.Socket is a high-performance IOCP framework TCP based on dotnet standard 2.0; Src contains its application test scenarios, such as websocket,rpc, redis driver, MVC WebAPI, lightweight message server, ultra large file transmission, etc. SAEA.Socket是一个高性能IOCP框架的 TCP,基于dotnet standard 2.0;Src中含有其应用测试场景,例如websocket、rpc、redis驱动、MVC WebAPI、轻量级消息服务器、超大文件传输等
Stars: ✭ 318 (-4.5%)
Mutual labels:  high-performance
Joyrpc
high-performance, high-extensibility Java rpc framework.
Stars: ✭ 290 (-12.91%)
Mutual labels:  high-performance
Redis Lua Scaling Bloom Filter
LUA Redis scripts for a scaling bloom filter
Stars: ✭ 268 (-19.52%)
Mutual labels:  bloom-filter
Ocbarrage
iOS 弹幕库 OCBarrage, 同时渲染5000条弹幕也不卡, 轻量, 可拓展, 高度自定义动画, 超高性能, 简单易上手; A barrage render-engine with high performance for iOS. At the same time, rendering 5000 barrages is also very smooth, lightweight, scalable, highly custom animation, ultra high performance, simple and easy to use!
Stars: ✭ 294 (-11.71%)
Mutual labels:  high-performance
Tarscpp
C++ language framework rpc source code implementation
Stars: ✭ 261 (-21.62%)
Mutual labels:  high-performance
Geojs
High-performance visualization and interactive data exploration of scientific and geospatial location aware datasets
Stars: ✭ 323 (-3%)
Mutual labels:  high-performance
Reveno
⚡ High performance and low latency Event Sourcing/CQRS framework
Stars: ✭ 283 (-15.02%)
Mutual labels:  high-performance
Awesome Graal
A curated list of awesome resources for Graal, GraalVM, Truffle and related topics
Stars: ✭ 302 (-9.31%)
Mutual labels:  high-performance
Object threadsafe
We make any object thread-safe and std::shared_mutex 10 times faster to achieve the speed of lock-free algorithms on >85% reads
Stars: ✭ 280 (-15.92%)
Mutual labels:  high-performance
Hash Table
Fast, reliable cuckoo hash table for Node.js.
Stars: ✭ 272 (-18.32%)
Mutual labels:  bloom-filter
Libbf
🎯 Bloom filters for C++11
Stars: ✭ 298 (-10.51%)
Mutual labels:  bloom-filter
Stormpot
A fast object pool for the JVM
Stars: ✭ 267 (-19.82%)
Mutual labels:  high-performance
Tarsjava
Java language framework rpc source code implementation
Stars: ✭ 321 (-3.6%)
Mutual labels:  high-performance
Emitter
High performance, distributed and low latency publish-subscribe platform.
Stars: ✭ 3,130 (+839.94%)
Mutual labels:  high-performance
Firefly
Firefly is an asynchronous web framework for rapid development of high-performance web application.
Stars: ✭ 277 (-16.82%)
Mutual labels:  high-performance
Jupiter
Jupiter is a high-performance 4-layer network load balance service based on DPDK.
Stars: ✭ 292 (-12.31%)
Mutual labels:  high-performance
Mikado
Mikado is the webs fastest template library for building user interfaces.
Stars: ✭ 323 (-3%)
Mutual labels:  high-performance

Bloom filter for Scala

Build Status Maven Central

Overview

"A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not. In other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed," says Wikipedia.

What's Bloom filter in a nutshell:

  • Optimization for memory. It comes into play when you cannot put whole set into memory.
  • Solves the membership problem. It can answer one question: does an element belong to a set or not?
  • Probabilistic (lossy) data structure. It can answer that an element probably belongs to a set with some probability.

Getting Started

libraryDependencies += "com.github.alexandrnikitin" %% "bloom-filter" % "latest.release"
// Create a Bloom filter
val expectedElements = 1000000
val falsePositiveRate = 0.1
val bf = BloomFilter[String](expectedElements, falsePositiveRate)

// Put an element
bf.add(element)

// Check whether an element in a set
bf.mightContain(element)

// Dispose the instance
bf.dispose()

Motivation

You can read about this Bloom filter and motivation behind in my blog post

Benchmarks

Here's a benchmark for the String type and results for other types are very similar to these:

[info] Benchmark                                              (length)   Mode  Cnt          Score         Error  Units
[info] alternatives.algebird.StringItemBenchmark.algebirdGet      1024  thrpt   20    1181080.172 ▒    9867.840  ops/s
[info] alternatives.algebird.StringItemBenchmark.algebirdPut      1024  thrpt   20     157158.453 ▒     844.623  ops/s
[info] alternatives.breeze.StringItemBenchmark.breezeGet          1024  thrpt   20    5113222.168 ▒   47005.466  ops/s
[info] alternatives.breeze.StringItemBenchmark.breezePut          1024  thrpt   20    4482377.337 ▒   19971.209  ops/s
[info] alternatives.guava.StringItemBenchmark.guavaGet            1024  thrpt   20    5712237.339 ▒  115453.495  ops/s
[info] alternatives.guava.StringItemBenchmark.guavaPut            1024  thrpt   20    5621712.282 ▒  307133.297  ops/s

[info] bloomfilter.mutable.StringItemBenchmark.myGet              1024  thrpt   20   11483828.730 ▒  342980.166  ops/s
[info] bloomfilter.mutable.StringItemBenchmark.myPut              1024  thrpt   20   11634399.272 ▒   45645.105  ops/s
[info] bloomfilter.mutable._128bit.StringItemBenchmark.myGet      1024  thrpt   20   11119086.965 ▒   43696.519  ops/s
[info] bloomfilter.mutable._128bit.StringItemBenchmark.myPut      1024  thrpt   20   11303765.075 ▒   52581.059  ops/s

Basically, this implementation is 2x faster than Google's Guava and 10-80x than Twitter's Algebird. Other benchmarks you can find in the "benchmarks' module on github

Warning: These are synthetic benchmarks in isolated environment. Usually the difference in throughput and latency is bigger in production system because it will stress the GC, lead to slow allocation paths and higher latencies, trigger the GC, etc.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].