All Projects → amit1rrr → Numcompress

amit1rrr / Numcompress

Licence: mit
Python package to compress numerical series & numpy arrays into strings

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Numcompress

rbzip2
bzip2 for Ruby
Stars: ✭ 39 (-42.65%)
Mutual labels:  compression, decompression
EasyCompressor
⚡ A compression library that implements many compression algorithms such as LZ4, Zstd, LZMA, Snappy, Brotli, GZip, and Deflate. It helps you to improve performance by reducing Memory Usage and Network Traffic for caching.
Stars: ✭ 167 (+145.59%)
Mutual labels:  compression, decompression
raisin
A simple lightweight set of implementations and bindings for compression algorithms written in Go.
Stars: ✭ 17 (-75%)
Mutual labels:  compression, decompression
gorilla
An effective time-series data compression/decompression method based on Facebook's Gorilla.
Stars: ✭ 51 (-25%)
Mutual labels:  compression, decompression
Rust Brotli
Brotli compressor and decompressor written in rust that optionally avoids the stdlib
Stars: ✭ 504 (+641.18%)
Mutual labels:  compression, decompression
zstd-rs
zstd-decoder in pure rust
Stars: ✭ 148 (+117.65%)
Mutual labels:  compression, decompression
pyrus-cramjam
Thin Python wrapper to de/compression algorithms in Rust - lightweight & no dependencies
Stars: ✭ 40 (-41.18%)
Mutual labels:  compression, decompression
Sdefl
Small inflate/deflate implementation in ~300 LoC of ANSI C
Stars: ✭ 120 (+76.47%)
Mutual labels:  compression, decompression
Xz
Pure golang package for reading and writing xz-compressed files
Stars: ✭ 330 (+385.29%)
Mutual labels:  compression, decompression
decompress
Pure OCaml implementation of Zlib.
Stars: ✭ 103 (+51.47%)
Mutual labels:  compression, decompression
Huffman-Coding
A C++ compression program based on Huffman's lossless compression algorithm and decoder.
Stars: ✭ 81 (+19.12%)
Mutual labels:  compression, decompression
Fflate
High performance (de)compression in an 8kB package
Stars: ✭ 547 (+704.41%)
Mutual labels:  compression, decompression
Minlzma
The Minimal LZMA (minlzma) project aims to provide a minimalistic, cross-platform, highly commented, standards-compliant C library (minlzlib) for decompressing LZMA2-encapsulated compressed data in LZMA format within an XZ container, as can be generated with Python 3.6, 7-zip, and xzutils
Stars: ✭ 236 (+247.06%)
Mutual labels:  compression, decompression
power-gzip
POWER9 gzip engine documentation and code samples
Stars: ✭ 16 (-76.47%)
Mutual labels:  compression, decompression
Util
A collection of useful utility functions
Stars: ✭ 201 (+195.59%)
Mutual labels:  compression, decompression
BrotliSharpLib
Full C# port of Brotli compression algorithm
Stars: ✭ 77 (+13.24%)
Mutual labels:  compression, decompression
Compress
Optimized Go Compression Packages
Stars: ✭ 2,478 (+3544.12%)
Mutual labels:  compression, decompression
Libmspack
A library for some loosely related Microsoft compression formats, CAB, CHM, HLP, LIT, KWAJ and SZDD.
Stars: ✭ 104 (+52.94%)
Mutual labels:  compression, decompression
huffman-coding
A C++ compression and decompression program based on Huffman Coding.
Stars: ✭ 31 (-54.41%)
Mutual labels:  compression, decompression
Lepton
Lepton is a tool and file format for losslessly compressing JPEGs by an average of 22%.
Stars: ✭ 4,918 (+7132.35%)
Mutual labels:  compression, decompression

PyPI version Build Status Coverage Status

numcompress

Simple way to compress and decompress numerical series & numpy arrays.

  • Easily gets you above 80% compression ratio
  • You can specify the precision you need for floating points (up to 10 decimal points)
  • Useful to store or transmit stock prices, monitoring data & other time series data in compressed string format

Compression algorithm is based on google encoded polyline format. I modified it to preserve arbitrary precision and apply it to any numerical series. The work is motivated by usefulness of time aware polyline built by Arjun Attam at HyperTrack. After building this I came across arrays that are much efficient than lists in terms memory footprint. You might consider using that over numcompress if you don't care about conversion to string for transmitting or storing purpose.

Installation

pip install numcompress

Usage

from numcompress import compress, decompress

# Integers
>>> compress([14578, 12759, 13525])
'[email protected]'

>>> decompress('[email protected]')
[14578.0, 12759.0, 13525.0]
# Floats - lossless compression
# precision argument specifies how many decimal points to preserve, defaults to 3
>>> compress([145.7834, 127.5989, 135.2569], precision=4)
'Csi~wAhdbJgqtC'

>>> decompress('Csi~wAhdbJgqtC')
[145.7834, 127.5989, 135.2569]
# Floats - lossy compression
>>> compress([145.7834, 127.5989, 135.2569], precision=2)
'Acn[rpB{[email protected]'

>>> decompress('Acn[rpB{[email protected]')
[145.78, 127.6, 135.26]
# compressing and decompressing numpy arrays
>>> from numcompress import compress_ndarray, decompress_ndarray
>>> import numpy as np

>>> series = np.random.randint(1, 100, 25).reshape(5, 5)
>>> compressed_series = compress_ndarray(series)
>>> decompressed_series = decompress_ndarray(compressed_series)

>>> series
array([[29, 95, 10, 48, 20],
       [60, 98, 73, 96, 71],
       [95, 59,  8,  6, 17],
       [ 5, 12, 69, 65, 52],
       [84,  6, 83, 20, 50]])

>>> compressed_series
'5*5,[email protected]_|[email protected][email protected]|[email protected]@_{[email protected]~heAnrbB~{BonT~lVotLoinB~xFnkX_o}@~iwCokuCn`[email protected]'

>>> decompressed_series
array([[29., 95., 10., 48., 20.],
       [60., 98., 73., 96., 71.],
       [95., 59.,  8.,  6., 17.],
       [ 5., 12., 69., 65., 52.],
       [84.,  6., 83., 20., 50.]])

>>> (series == decompressed_series).all()
True

Compression Ratio

Test # of Numbers Compression ratio
Integers 10k 91.14%
Floats 10k 81.35%

You can run the test suite with -s switch to see the compression ratio. You can even modify the tests to see what kind of compression ratio you will get for your own input.

pytest -s

Here's a quick example showing compression ratio:

>>> series = random.sample(range(1, 100000), 50000)  # generate 50k random numbers between 1 and 100k
>>> text = compress(series)  # apply compression

>>> original_size = sum(sys.getsizeof(i) for i in series)
>>> original_size
1200000

>>> compressed_size = sys.getsizeof(text)
>>> compressed_size
284092

>>> compression_ratio = ((original_size - compressed_size) * 100.0) / original_size
>>> compression_ratio
76.32566666666666

We get ~76% compression for 50k random numbers between 1 & 100k. This ratio increases for real world numerical series as the difference between consecutive numbers tends to be lower. Think of stock prices, monitoring & other time series data.

Contribute

If you see any problem, open an issue or send a pull request. You can write to [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].