All Projects → ademakov → Oroch

ademakov / Oroch

Licence: MIT license
A C++ library for integer array compression

Programming Languages

C++
36643 projects - #6 most used programming language
M4
1887 projects

Projects that are alternatives of or similar to Oroch

FastIntegerCompression.js
Fast integer compression library in JavaScript
Stars: ✭ 46 (+109.09%)
Mutual labels:  compression, integer-compression
VTEnc
VTEnc C library
Stars: ✭ 31 (+40.91%)
Mutual labels:  compression, integer-compression
FrameOfReference
C++ library to pack and unpack vectors of integers having a small range of values using a technique called Frame of Reference
Stars: ✭ 36 (+63.64%)
Mutual labels:  compression, integer-compression
ruby-xz
Ruby bindings for liblzma, using fiddle
Stars: ✭ 33 (+50%)
Mutual labels:  compression
lz4ultra
Optimal LZ4 compressor, that produces files that decompress faster while keeping the best compression ratio
Stars: ✭ 49 (+122.73%)
Mutual labels:  compression
Frost
A backup program that does deduplication, compression, encryption
Stars: ✭ 25 (+13.64%)
Mutual labels:  compression
arch-config
Scripts and Ansible playbook to setup Arch Linux on ZFS.
Stars: ✭ 36 (+63.64%)
Mutual labels:  compression
gorilla
An effective time-series data compression/decompression method based on Facebook's Gorilla.
Stars: ✭ 51 (+131.82%)
Mutual labels:  compression
DRV3-Tools
(Not actively maintained, use DRV3-Sharp) Tools for extracting and re-injecting files for Danganronpa V3 for PC.
Stars: ✭ 13 (-40.91%)
Mutual labels:  compression
blz4
Example of LZ4 compression with optimal parsing using BriefLZ algorithms
Stars: ✭ 24 (+9.09%)
Mutual labels:  compression
Unishox2
Compression for Unicode short strings
Stars: ✭ 124 (+463.64%)
Mutual labels:  compression
ZRA
ZStandard Random Access (ZRA) allows random access inside an archive compressed using ZStandard
Stars: ✭ 21 (-4.55%)
Mutual labels:  compression
kanzi-cpp
Lossless data compression in C++
Stars: ✭ 60 (+172.73%)
Mutual labels:  compression
raroscope
A pure Java library for scanning and enumerating RAR archive contents
Stars: ✭ 32 (+45.45%)
Mutual labels:  compression
Re-Pair
Offline Dictionary-based Compression (Re-Pair, Recursive Pairing)
Stars: ✭ 21 (-4.55%)
Mutual labels:  compression
prunnable-layers-pytorch
Prunable nn layers for pytorch.
Stars: ✭ 47 (+113.64%)
Mutual labels:  compression
ZipArchive
A single-class pure VB6 library for zip with ASM speed
Stars: ✭ 38 (+72.73%)
Mutual labels:  compression
py-lz4framed
LZ4-frame library for Python (via C bindings)
Stars: ✭ 42 (+90.91%)
Mutual labels:  compression
dedupsqlfs
Deduplicating filesystem via Python3, FUSE and SQLite
Stars: ✭ 24 (+9.09%)
Mutual labels:  compression
torchprune
A research library for pytorch-based neural network pruning, compression, and more.
Stars: ✭ 133 (+504.55%)
Mutual labels:  compression

Oroch

A C++ library for integer array compression.

The focus of the library is uniform handling of the different integer types. The same template-based interface deals with short and long, unsigned and signed types. Below is a sample of the library use:

    std::array<int, 6> ints = { 1, 100, 10000, -1, -100, -10000 };
    std::array<size_t, 6> sizes { 1, 100, 10000, 1, 100, 10000 };

    // Get the memory space required to encode the arrays.
    size_t ints_space = oroch::varint_codec<int>::space(ints.begin(), ints.end());
    size_t sizes_space = oroch::varint_codec<size_t>::space(sizes.begin(), sizes.end());
    std::cout << ints_space << "\n" << sizes_space << "\n";

    // Allocate the required memory.
    std::unique_ptr<uint8_t[]> store(new uint8_t[ints_space + sizes_space]);

    // Encode the arrays.
    uint8_t *ptr = store.get();
    oroch::varint_codec<int>::encode(ptr, ints.begin(), ints.end());
    assert(ptr == (store.get() + ints_space));
    oroch::varint_codec<size_t>::encode(ptr, sizes.begin(), sizes.end());
    assert(ptr == (store.get() + ints_space + sizes_space));

The output of this sample would be like this:

12
8

The template mechanism provided by the library automatically applies zigzag encoding to the int type and only after that uses the varint codec. For the size_t type the zigzag codec is avoided.

In addition to the varint codec the library also provides bit-packing codecs:

  • basic bit-packing codec (in "oroch/bitpck.h"),
  • bit-packing with a frame-of-reference technique (in "oroch/bitfor.h"),
  • bit-packing with a frame-of-reference and patching (in "oroch/bitpfr.h").

The best choice among these codecs depends on the input data. The library provides a utility class that compares different codecs against a given input and selects the best. The class is defined in the "oroch/integer_codec.h" header. This utility has somewhat complicated interface though. An example of how to properly use it is provided in the "oroch/integer_group.h" header.

A more useful example is provided in the "oroch/integer_array.h" header. As might be obvious from it contains an implementation of an array of integers that are stored in compressed form.

The implementation supports just a few methods:

#include <oroch/integer_array.h>
...
oroch::integer_array<int> array;
array.insert(0, 100);
array.insert(0, 200);
std::cout << array.at(0) << '\n';
std::cout << array.find(200) << '\n';

Comparison

There are already many integer compression libraies available:

It seems that these libraries are extremely good at what they do. Mostly they focus on the speed. To this end they limit other features and flexibility. For instance, some of the libraies handle only 32-bit integers. Or implement a narrow set of compression algorithms. Or they are too big.

The focus of the Oroch library is flexibility and ability to switch to other compression method by changing just a single line of the code. It is also realtively small compared to other libraries.

If your project does not need to decode billions of integers per second and could trade this for smaller and more manageable source code base, then the Oroch library might be for you.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].