dgryski / Go Fastquantiles
Programming Languages
Approximate streaming quantiles
NOTE: THIS CODE IS CURRENTLY BROKEN
For a fast streaming quantile implementation, please see http://github.com/dgryski/go-gk
My goal is to implement:
"A Fast Algorithm for Approximate Quantiles in High Speed Data Streams" (Qi Zhang and Wei Wang, 2007).
http://www.cs.unc.edu/~zhangq/PUBLICATIONS/SSDBM07-Qi%20Zhang-FastQStream.pdf
Course notes: http://www.mathcs.emory.edu/~cheung/Courses/584-StreamDB/Syllabus/08-Quantile/Zhang.html http://www.mathcs.emory.edu/~cheung/Courses/584-StreamDB/Syllabus/08-Quantile/Zhang2.html
Zhang/Wang is based on the algorithm in "Space-Efficient Online Computation of Quantile Summaries" (Greenwald, Khanna, 2001) http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf
So I decided to start with GK. There's a good overview at http://papercruncher.com/2010/03/02/stream-algorithms-order-statistics/
and more detailed notes at http://www.mathcs.emory.edu/~cheung/Courses/584-StreamDB/Syllabus/08-Quantile/Greenwald.html http://www.mathcs.emory.edu/~cheung/Courses/584-StreamDB/Syllabus/08-Quantile/Greenwald2.html
The MERGE algorithm is taken from "Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream" (Lin, H. Lu, J. Xu, and J.X. Yu, 2004) http://www.cs.ubc.ca/~xujian/paper/quant.pdf