All Projects → kei6u → gorilla

kei6u / gorilla

Licence: MIT license
An effective time-series data compression/decompression method based on Facebook's Gorilla.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to gorilla

Rust Brotli
Brotli compressor and decompressor written in rust that optionally avoids the stdlib
Stars: ✭ 504 (+888.24%)
Mutual labels:  compression, decompression
Compressonator
Tool suite for Texture and 3D Model Compression, Optimization and Analysis using CPUs, GPUs and APUs
Stars: ✭ 785 (+1439.22%)
Mutual labels:  compression, decompression
Turbopfor Integer Compression
Fastest Integer Compression
Stars: ✭ 520 (+919.61%)
Mutual labels:  compression, time-series
decompress
Pure OCaml implementation of Zlib.
Stars: ✭ 103 (+101.96%)
Mutual labels:  compression, decompression
Gorilla Tsc
Implementation of time series compression method from the Facebook's Gorilla paper
Stars: ✭ 147 (+188.24%)
Mutual labels:  compression, time-series
Compress
Optimized Go Compression Packages
Stars: ✭ 2,478 (+4758.82%)
Mutual labels:  compression, decompression
Fflate
High performance (de)compression in an 8kB package
Stars: ✭ 547 (+972.55%)
Mutual labels:  compression, decompression
BrotliSharpLib
Full C# port of Brotli compression algorithm
Stars: ✭ 77 (+50.98%)
Mutual labels:  compression, decompression
Sdefl
Small inflate/deflate implementation in ~300 LoC of ANSI C
Stars: ✭ 120 (+135.29%)
Mutual labels:  compression, decompression
Libmspack
A library for some loosely related Microsoft compression formats, CAB, CHM, HLP, LIT, KWAJ and SZDD.
Stars: ✭ 104 (+103.92%)
Mutual labels:  compression, decompression
huffman-coding
A C++ compression and decompression program based on Huffman Coding.
Stars: ✭ 31 (-39.22%)
Mutual labels:  compression, decompression
Minlzma
The Minimal LZMA (minlzma) project aims to provide a minimalistic, cross-platform, highly commented, standards-compliant C library (minlzlib) for decompressing LZMA2-encapsulated compressed data in LZMA format within an XZ container, as can be generated with Python 3.6, 7-zip, and xzutils
Stars: ✭ 236 (+362.75%)
Mutual labels:  compression, decompression
EasyCompressor
⚡ A compression library that implements many compression algorithms such as LZ4, Zstd, LZMA, Snappy, Brotli, GZip, and Deflate. It helps you to improve performance by reducing Memory Usage and Network Traffic for caching.
Stars: ✭ 167 (+227.45%)
Mutual labels:  compression, decompression
Xz
Pure golang package for reading and writing xz-compressed files
Stars: ✭ 330 (+547.06%)
Mutual labels:  compression, decompression
pyrus-cramjam
Thin Python wrapper to de/compression algorithms in Rust - lightweight & no dependencies
Stars: ✭ 40 (-21.57%)
Mutual labels:  compression, decompression
Lepton
Lepton is a tool and file format for losslessly compressing JPEGs by an average of 22%.
Stars: ✭ 4,918 (+9543.14%)
Mutual labels:  compression, decompression
rbzip2
bzip2 for Ruby
Stars: ✭ 39 (-23.53%)
Mutual labels:  compression, decompression
raisin
A simple lightweight set of implementations and bindings for compression algorithms written in Go.
Stars: ✭ 17 (-66.67%)
Mutual labels:  compression, decompression
Numcompress
Python package to compress numerical series & numpy arrays into strings
Stars: ✭ 68 (+33.33%)
Mutual labels:  compression, decompression
Util
A collection of useful utility functions
Stars: ✭ 201 (+294.12%)
Mutual labels:  compression, decompression

Gorilla

Go Reference .github/workflows/test.yaml

An effective time-series data compression method based on Facebook's Gorilla. It uses delta-of-delta timestamps and XOR'd floating-points values to reduce the data size. This is because most data points arrived at a fixed interval and the value in most time-series does not change significantly.

Pelkonen, T., Franklin, S., Teller, J., Cavallaro, P., Huang, Q., Meza, J., & Veeraraghavan, K. (2015). Gorilla. Proceedings of the VLDB Endowment, 8(12), 1816–1827. https://doi.org/10.14778/2824032.2824078

Notes

Use timestamp seconds to keep high compression rate

Using milliseconds as a timestamp increases the amount of bits because timestamps arrives at no longer evenly interval. A delta of delta second timestamp which arrives at a certain interval is basically 0, 1 or -1, so it can keep small data size.

As the value fraction increases, the compression ratio decreases

This method converts each value to a 64-bit sequence of floating-point numbers and XORed with one previous data point. If the values are the same, they will be zero, so encoding them will only require one bit of zero. Also, instead of recording the result of the XOR in 64 bits each time, gorilla compresses the bits from the beginning that are followed by 0 (LeadingZeros) and the bits from the end that are followed by 0 (TrailingZeros) and record the rest of the bit sequence. In addition, if there are more digits in the LeadingZeros and TrailingZeros than in the previous value, they are left alone, and only the rest of the bit sequence is recorded. Otherwise, the new LeadingZeros and TrailingZeros values and the rest of the bit sequence are recorded. gorilla requires fewer bits if the value is 63.0, 63.5, or some other floating-point number with many zero bits from the middle to the end of the mantissa part. However, for a number like 0.1, many bits in the mantissa are non-zero, so the LeadingZeros and TrailingZeros values become small, and it takes a lot of bits to record the rest of the bit sequence.

Compression window should be longer than 2 hours

Using the same compressing scheme over larger time periods allows us to achieve better compression ratios. However, queries that wish to read data over a short time range might need to expend additional computational resources on decompressing data. The paper shows the average compression ratio for the time series as we change the block size. One can see that blocks that extend longer than two hours provide diminishing returns for compressed size. A two-hour block allows us to achieve a compression ratio of 1.37 bytes per data point.

Usage

Installing

go get github.com/keisku/gorilla

Compressor

buf := new(bytes.Buffer)
header := uint32(time.Now().Unix())

c, finish, err := gorilla.NewCompressor(buf, header)
if err != nil {
    return err
}

if err := c.Compress(uint32(time.Now().Unix()), 10.0); err != nil {
    return err
}
if err := c.Compress(uint32(time.Now().Unix()), 10.5); err != nil {
    return err
}

return finish()

Decompressor

buf := new(bytes.Buffer)

// Compressing time-series data to buf ...

d, h, err := gorilla.NewDecompressor(buf)
if err != nil {
    return err
}

fmt.Printf("header: %v\n", h)

iter := d.Iterator()
for iter.Next() {
    t, v := iter.At()
    fmt.Println(t, v)
}

return iter.Err()
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].