lz4lite

lz4lite provides access to the extremely fast compression in lz4 for performing in-memory compression.

As of v0.2.0, lz4lite can now serialize and compress any R object understood by base::serialize().

If the input is known to be an atomic, numeric vector, and you do not care about any attributes or names on this vector, then lz4_compress()/lz4_uncompress() can be used. These are bespoke serialization routines for atomic numeric vectors that run faster since they avoid R’s internals.

For a more general solution to fast serialization of R objects, see the fst or qs packages.

Currently lz4 code provided with this package is v1.9.3.

What’s in the box

For arbitrary R objects
- lz4_serialize/lz4_unserialize serialize and compress any R object.
For atomic vectors with numeric values
- lz4_compress()/lz4_uncompress()
  - compress the data within a vector of raw, integer, real, complex or logical values
  - faster than lz4_serialize/unserialize but throws away all attributes i.e. names, dims etc

Installation

You can install from GitHub with:

# install.package('remotes')
remotes::install_github('coolbutuseless/lz4lite)

Basic usage of lz4lite

dat <- mtcars


buf <- lz4_serialize(dat)
length(buf) # Number of bytes

#> [1] 1862

# compression ratio
length(buf)/length(serialize(dat, NULL))

#> [1] 0.489099

head(lz4_unserialize(buf))

#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Compressing 1 million Integers

library(lz4lite)

max_hc <- 12

set.seed(1)
N                <- 5e6
input_ints       <- sample(1:3, N, prob = (1:3)^3, replace = TRUE)
serialize_base   <- serialize(input_ints, NULL, xdr = FALSE)
serialize_lo     <- lz4_serialize(input_ints, acceleration = 1)
serialize_hi_3   <- lz4hc_serialize(input_ints, level =  3)
serialize_hi_9   <- lz4hc_serialize(input_ints, level =  9)
serialize_hi_12  <- lz4hc_serialize(input_ints, level = max_hc)
compress_lo      <- lz4_compress(input_ints, acceleration = 1)
compress_hi_3    <- lz4hc_compress(input_ints, level = 3)
compress_hi_9    <- lz4hc_compress(input_ints, level = 9)
compress_hi_12   <- lz4hc_compress(input_ints, level = max_hc)

Click here to show/hide benchmark code

library(lz4lite)

res <- bench::mark(
  serialize(input_ints, NULL, xdr = FALSE),
  lz4_serialize(input_ints, acceleration = 1),
  lz4hc_serialize(input_ints, level =  3),
  lz4hc_serialize(input_ints, level =  9),
  lz4hc_serialize(input_ints, level = max_hc),
  lz4_compress (input_ints, acceleration = 1),
  lz4hc_compress (input_ints, level =  3),
  lz4hc_compress (input_ints, level =  9),
  lz4hc_compress (input_ints, level = max_hc),
  check = FALSE
)

expression	median	itr/sec	MB/s	compression_ratio
serialize(input_ints, NULL, xdr = FALSE)	18.99ms	50	1004.5	1.000
lz4_serialize(input_ints, acceleration = 1)	30.58ms	32	623.7	0.222
lz4hc_serialize(input_ints, level = 3)	215.84ms	5	88.4	0.155
lz4hc_serialize(input_ints, level = 9)	3.28s	0	5.8	0.088
lz4hc_serialize(input_ints, level = max_hc)	36.09s	0	0.5	0.063
lz4_compress(input_ints, acceleration = 1)	24.16ms	41	789.4	0.222
lz4hc_compress(input_ints, level = 3)	208.71ms	5	91.4	0.155
lz4hc_compress(input_ints, level = 9)	3.28s	0	5.8	0.088
lz4hc_compress(input_ints, level = max_hc)	36.36s	0	0.5	0.063

uncompressing 1 million integers

uncompression speed varies slightly depending upon the compressed size.

Click here to show/hide benchmark code

res <- bench::mark(
  lz4_uncompress(compress_lo),
  lz4_uncompress(compress_hi_3),
  lz4_uncompress(compress_hi_9),
  lz4_uncompress(compress_hi_12)
)

expression	median	itr/sec	MB/s
lz4_uncompress(compress_lo)	12.26ms	79	1555.4
lz4_uncompress(compress_hi_3)	12.37ms	70	1542.4
lz4_uncompress(compress_hi_9)	12.97ms	94	1470.4
lz4_uncompress(compress_hi_12)	6.03ms	121	3161.8

uncompressing 1 million integers

uncompression speed varies slightly depending upon the compressed size.

Click here to show/hide benchmark code

res <- bench::mark(
  unserialize(serialize_base),
  lz4_unserialize(serialize_lo),
  lz4_unserialize(serialize_hi_3),
  lz4_unserialize(serialize_hi_9),
  lz4_unserialize(serialize_hi_12)
)

expression	median	itr/sec	MB/s
unserialize(serialize_base)	6.64ms	120	2871.9
lz4_unserialize(serialize_lo)	29.8ms	38	640.0
lz4_unserialize(serialize_hi_3)	29.38ms	39	649.3
lz4_unserialize(serialize_hi_9)	24.97ms	48	763.8
lz4_unserialize(serialize_hi_12)	23.87ms	49	799.0

Technical bits

Framing of the compressed data

lz4lite does not use the standard LZ4 frame to store data.
The compressed representation is the compressed data prefixed with a custom 8 byte header consisting of
- 3 bytes = ‘LZ4’
- If this was produced with lz4_serialize() the next byte is 0x00, otherwise it is a byte representing the SEXP of the encoded object.
- 4-byte length value i.e. the number of bytes in the original uncompressed data.
This data representation
- is not compatible with the standard LZ4 frame format.
- is likely to evolve (so currently do not plan on compressing something in one version of lz4lite and uncompressing in another version.)

Related Software

lz4 and zstd - both by Yann Collet
fst for serialisation of data.frames using lz4 and zstd
qs for fast serialization of arbitrary R objects with lz4 and zstd

Acknowledgements

Yann Collett for releasing, maintaining and advancing lz4 and zstd
R Core for developing and maintaining such a wonderful language.
CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining the repository

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

coolbutuseless / lz4lite

Programming Languages

lz4lite

What’s in the box

Installation

Basic usage of lz4lite

Compressing 1 million Integers

uncompressing 1 million integers

uncompressing 1 million integers

Technical bits

Framing of the compressed data

Related Software

Acknowledgements