All Projects → oertl → hyperloglog-sketch-estimation-paper

oertl / hyperloglog-sketch-estimation-paper

Licence: other
Paper about the estimation of cardinalities from HyperLogLog sketches

Programming Languages

TeX
3793 projects
C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
CSS
56736 projects
HTML
75241 projects

Projects that are alternatives of or similar to hyperloglog-sketch-estimation-paper

set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (-52.08%)
Mutual labels:  hyperloglog, sketch-data-structures, cardinality-estimation, hyperloglog-sketches
HyperMinHash-java
Union, intersection, and set cardinality in loglog space
Stars: ✭ 48 (+0%)
Mutual labels:  hyperloglog, cardinality-estimation, hyperloglog-sketches
Datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+3306.25%)
Mutual labels:  hyperloglog, data-sketches
ntCard
Estimating k-mer coverage histogram of genomics data
Stars: ✭ 69 (+43.75%)
Mutual labels:  hyperloglog, cardinality-estimation
HyperLogLog
Fast HyperLogLog for Python.
Stars: ✭ 86 (+79.17%)
Mutual labels:  hyperloglog, cardinality-estimation
rust-hyperloglog
A HyperLogLog implementation in Rust.
Stars: ✭ 44 (-8.33%)
Mutual labels:  hyperloglog
deepdb-public
Implementation of DeepDB: Learn from Data, not from Queries!
Stars: ✭ 61 (+27.08%)
Mutual labels:  cardinality-estimation
naru
Neural Relation Understanding: neural cardinality estimators for tabular data
Stars: ✭ 76 (+58.33%)
Mutual labels:  cardinality-estimation
sketches
HyperLogLog and other probabilistic data structures for mining in data streams
Stars: ✭ 15 (-68.75%)
Mutual labels:  hyperloglog
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-41.67%)
Mutual labels:  data-sketches
phphll
HyperLogLog for PHP implemented as a C extension
Stars: ✭ 19 (-60.42%)
Mutual labels:  hyperloglog

New cardinality estimation algorithms for HyperLogLog sketches

Paper about the estimation of cardinalities from HyperLogLog sketches.

Abstract

This paper presents new methods to estimate the cardinalities of data sets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large cardinalities. Based on the maximum likelihood principle a second unbiased method is derived together with a robust and efficient numerical algorithm to calculate the estimate. The maximum likelihood approach can also be applied to more than a single HyperLogLog sketch. In particular, it is shown that it gives more precise cardinality estimates for union, intersection, or relative complements of two sets that are both represented by HyperLogLog sketches compared to the conventional technique using the inclusion-exclusion principle. All the new methods are demonstrated and verified by extensive simulations.

See also

The paper SetSketch: Filling the Gap between MinHash and HyperLogLog and corresponding GitHub project present a new data structure, which can be seen as a generalization of HyperLogLog, and whose estimators can therefore also be applied to HyperLogLog sketches.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].