All Projects → kalaidin → sketches

kalaidin / sketches

Licence: other
HyperLogLog and other probabilistic data structures for mining in data streams

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to sketches

phphll
HyperLogLog for PHP implemented as a C extension
Stars: ✭ 19 (+26.67%)
Mutual labels:  hyperloglog
react
A wrapper component that allows you to utilise P5 sketches within React apps.
Stars: ✭ 332 (+2113.33%)
Mutual labels:  sketches
awesome-bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 11,093 (+73853.33%)
Mutual labels:  data-stream
ntCard
Estimating k-mer coverage histogram of genomics data
Stars: ✭ 69 (+360%)
Mutual labels:  hyperloglog
Rough-Sketch-Simplification-Using-FCNN
This is a PyTorch implementation of the the Paper by Simo-Sera et.al. on Cleaning Rough Sketches using Fully Convolutional Neural Networks.
Stars: ✭ 31 (+106.67%)
Mutual labels:  sketches
Sketches
Creative coding sketches made with Java, Processing 3.5.3 and GLSL. Includes a custom GUI.
Stars: ✭ 18 (+20%)
Mutual labels:  sketches
hyperloglog-sketch-estimation-paper
Paper about the estimation of cardinalities from HyperLogLog sketches
Stars: ✭ 48 (+220%)
Mutual labels:  hyperloglog
Awesome Bigdata
A curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+69753.33%)
Mutual labels:  data-stream
EnviroDIY Mayfly Logger
hardware design files, example code sketches, and documentation for Arduino-framework EnviroDIY Mayfly data logger
Stars: ✭ 25 (+66.67%)
Mutual labels:  sketches
cdc
A library for performing Content-Defined Chunking (CDC) on data streams.
Stars: ✭ 18 (+20%)
Mutual labels:  data-stream
Datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+10800%)
Mutual labels:  hyperloglog
bullet-core
Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (+140%)
Mutual labels:  sketches
tornado
The Tornado 🌪️ framework, designed and implemented for adaptive online learning and data stream mining in Python.
Stars: ✭ 110 (+633.33%)
Mutual labels:  data-stream
set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (+53.33%)
Mutual labels:  hyperloglog
richflow
A Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.
Stars: ✭ 17 (+13.33%)
Mutual labels:  data-stream
HyperLogLog
Fast HyperLogLog for Python.
Stars: ✭ 86 (+473.33%)
Mutual labels:  hyperloglog
processing-sketchbook
Open Source Sketchbook written in Processing Language
Stars: ✭ 18 (+20%)
Mutual labels:  sketches
Strimzi Kafka Operator
Apache Kafka running on Kubernetes
Stars: ✭ 2,833 (+18786.67%)
Mutual labels:  data-stream
gocells
Event Based Applications [DEPRECATED]
Stars: ✭ 69 (+360%)
Mutual labels:  data-stream
analyzing-reddit-sentiment-with-aws
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
Stars: ✭ 40 (+166.67%)
Mutual labels:  data-stream

sketches

aka Probabilistic data structures for mining in data streams, in pure Python.

Installation

python setup.py install

HyperLogLog

Original paper: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf

More on: http://research.neustar.biz/tag/hyperloglog/

Usage:

from sketches import HyperLogLog

h = HyperLogLog(10)

for i in range(100000):
  h.add(i)

print(h.estimate())

> 99860.5333365

Count-Min

Original paper: here

More on: https://sites.google.com/site/countminsketch/

Usage:

from sketches import CountMin

s = CountMin(10, 10)
data = np.random.zipf(2, 10000)
for v in data:
    s.add(v)

print(s.estimate(1))
> 6130.0

print(len([x for x in data if x == 1]))
> 6110

TODO:

  • HLL improvements:
    • HLL++
    • Sliding window HLL
  • Count-Mean-Min
  • Stream-Summary
  • Min-Hash
  • Bloom filter
  • Frugal sketches
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].