zstdlite
zstdlite
provides access to the very fast (and highly configurable)
zstd library for performing
in-memory compression of R objects.
For rock solid general solutions to fast serialization of R objects, see the fst or qs packages.
zstd code provided with this package is v1.5.0.
What’s in the box
zstd_serialize()
andzstd_unserialize()
for converting R objects to/from a compressed representation
Installation
You can install from GitHub with:
# install.package('remotes')
remotes::install_github('coolbutuseless/zstdlite')
Basic usage of zstdlite
arr <- array(1:27, c(3, 3, 3))
lobstr::obj_size(arr)
#> 352 B
buf <- zstd_serialize(arr)
length(buf) # Number of bytes
#> [1] 117
# compression ratio
length(buf)/as.numeric(lobstr::obj_size(arr))
#> [1] 0.3323864
zstd_unserialize(buf)
#> , , 1
#>
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
#>
#> , , 2
#>
#> [,1] [,2] [,3]
#> [1,] 10 13 16
#> [2,] 11 14 17
#> [3,] 12 15 18
#>
#> , , 3
#>
#> [,1] [,2] [,3]
#> [1,] 19 22 25
#> [2,] 20 23 26
#> [3,] 21 24 27
Compressing 1 million Integers
library(zstdlite)
library(lz4lite)
set.seed(1)
N <- 1e6
input_ints <- sample(1:5, N, prob = (1:5)^4, replace = TRUE)
uncompressed <- serialize(input_ints, NULL, xdr = FALSE)
compressed_lo <- zstd_serialize(input_ints, level = -5)
compressed_mid <- zstd_serialize(input_ints, level = 3)
compressed_mid2 <- zstd_serialize(input_ints, level = 10)
compressed_hi <- zstd_serialize(input_ints, level = 22)
compressed_base <- memCompress(serialize(input_ints, NULL, xdr = FALSE))
Click here to show/hide benchmark code
library(zstdlite)
res <- bench::mark(
serialize(input_ints, NULL, xdr = FALSE),
memCompress(serialize(input_ints, NULL, xdr = FALSE)),
zstd_serialize(input_ints, level = -5),
zstd_serialize(input_ints, level = 3),
zstd_serialize(input_ints, level = 10),
zstd_serialize(input_ints, level = 22),
check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
expression | median | itr/sec | MB/s | compression_ratio |
---|---|---|---|---|
serialize(input_ints, NULL, xdr = FALSE) | 3.62ms | 177 | 1053.9 | 1.000 |
memCompress(serialize(input_ints, NULL, xdr = FALSE)) | 184.41ms | 5 | 20.7 | 0.079 |
zstd_serialize(input_ints, level = -5) | 65.03ms | 15 | 58.7 | 0.229 |
zstd_serialize(input_ints, level = 3) | 85.24ms | 12 | 44.8 | 0.101 |
zstd_serialize(input_ints, level = 10) | 702.61ms | 1 | 5.4 | 0.080 |
zstd_serialize(input_ints, level = 22) | 13.02s | 0 | 0.3 | 0.058 |
Decompressing 1 million integers
Click here to show/hide benchmark code
res <- bench::mark(
unserialize(uncompressed),
zstd_unserialize(compressed_lo),
zstd_unserialize(compressed_mid2),
zstd_unserialize(compressed_hi),
unserialize(memDecompress(compressed_base, type = 'gzip')),
check = TRUE
)
expression | median | itr/sec | MB/s |
---|---|---|---|
unserialize(uncompressed) | 604.4µs | 1417 | 6311.8 |
zstd_unserialize(compressed_lo) | 42.3ms | 23 | 90.1 |
zstd_unserialize(compressed_mid2) | 29.2ms | 34 | 130.6 |
zstd_unserialize(compressed_hi) | 18.1ms | 58 | 210.6 |
unserialize(memDecompress(compressed_base, type = “gzip”)) | 14.5ms | 68 | 263.2 |
Zstd “Single File” Libary
- To simplify the code within this package, it uses the ‘single file library’ version of zstd
- To update this package when zstd is updated, create the single file
library version
- cd zstd/build/single_file_libs
- sh create_single_file_library.sh
- Wait…..
- copy zstd/built/single_file_libs/zstd.c into zstdlite/src
- copy zstd/lib/zstd.h into zstdlite/src
Related Software
For a more general solution to fast serialization of R objects, see the fst or qs packages.
- lz4 and zstd - both by Yann Collet
- fst for serialisation of data.frames using lz4 and zstd
- qs for fast serialization of arbitrary R objects with lz4 and zstd