All Projects → vteromero → VTEnc

vteromero / VTEnc

Licence: MIT License
VTEnc C library

Programming Languages

c
50402 projects - #5 most used programming language
Makefile
30231 projects
shell
77523 projects

Projects that are alternatives of or similar to VTEnc

FrameOfReference
C++ library to pack and unpack vectors of integers having a small range of values using a technique called Frame of Reference
Stars: ✭ 36 (+16.13%)
Mutual labels:  compression, integer-compression
FastIntegerCompression.js
Fast integer compression library in JavaScript
Stars: ✭ 46 (+48.39%)
Mutual labels:  compression, integer-compression
Oroch
A C++ library for integer array compression
Stars: ✭ 22 (-29.03%)
Mutual labels:  compression, integer-compression
huffman
Using huffman coding to compress-decompress real-world files
Stars: ✭ 27 (-12.9%)
Mutual labels:  compression
BrotliSharpLib
Full C# port of Brotli compression algorithm
Stars: ✭ 77 (+148.39%)
Mutual labels:  compression
zopflipy
A Python bindings for Zopfli
Stars: ✭ 17 (-45.16%)
Mutual labels:  compression
bzip2-rs
Pure Rust bzip2 decoder
Stars: ✭ 28 (-9.68%)
Mutual labels:  compression
naf
Nucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences
Stars: ✭ 35 (+12.9%)
Mutual labels:  compression
rangecoder
a fast range coder in C++, using SSE
Stars: ✭ 50 (+61.29%)
Mutual labels:  compression
TinyJPG
images jpg or jpeg compressed and watcher fsnotify
Stars: ✭ 73 (+135.48%)
Mutual labels:  compression
DAR
DAR - Disk ARchive
Stars: ✭ 58 (+87.1%)
Mutual labels:  compression
SevenZipSharp
Fork of SevenZipSharp on CodePlex
Stars: ✭ 171 (+451.61%)
Mutual labels:  compression
pakkero
Pakkero is a binary packer written in Go made for fun and educational purpose. Its main goal is to take in input a program file (elf binary, script, even appimage) and compress it, protect it from tampering and intrusion.
Stars: ✭ 143 (+361.29%)
Mutual labels:  compression
lepcc
Point Cloud Compression used in i3s Scene Layer Format
Stars: ✭ 22 (-29.03%)
Mutual labels:  compression
image-optimizer
Smart image optimization
Stars: ✭ 15 (-51.61%)
Mutual labels:  compression
exhal
Compression and decompression tools for NES, SNES, and Game Boy games by HAL Laboratory
Stars: ✭ 54 (+74.19%)
Mutual labels:  compression
lz4-napi
Fastest lz4 compression library in Node.js, powered by napi-rs and lz4-flex.
Stars: ✭ 29 (-6.45%)
Mutual labels:  compression
node-janus
https://wiki.mozilla.org/Mobile/Janus
Stars: ✭ 58 (+87.1%)
Mutual labels:  compression
rc-zip
Pure rust zip & zip64 reading and writing
Stars: ✭ 93 (+200%)
Mutual labels:  compression
pyrus-cramjam
Thin Python wrapper to de/compression algorithms in Rust - lightweight & no dependencies
Stars: ✭ 40 (+29.03%)
Mutual labels:  compression

VTEnc

VTEnc is a compression algorithm for sorted lists of unsigned integers. This C library provides an interface to use VTEnc with sequences of various data types (8, 16, 32 and 64 bits).

WARNING: This project is in its initial stages of development and might dramatically change on each version. Have a look at the Development philosophy section to understand the rationale behind it.

Results

This is a work-in-progress section. Last updated on: 2022-03-01

For reference, VTEnc has been tested on a laptop Ubuntu Desktop 21.10 with a Core i7-6700HQ CPU @ 2.60GHz x 8, using different data sets and comparing it with other integer compression algorithms. See Tests and Benchmarks below for further information.

  • gov2.sorted:
Encoding speed vs ratio Decoding speed vs ratio
Encoding speed Decoding speed

(*) VTEnc's results on "speed vs ratio" charts are for the following values of the encoding parameter min_cluster_length: 1, 2, 4, 8, 16, 32, 64, 128 and 256.
(**) VTEnc's results on "speed" charts are for min_cluster_length = 256.

  • Timestamps (ts.txt):
Encoding speed vs ratio Decoding speed vs ratio
Encoding speed Decoding speed

API

The documented interface is found in the header file vtenc.h.

Example

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "vtenc.h"

int encode(
  vtenc *handler,
  const uint8_t *values,
  size_t values_len,
  uint8_t **out,
  size_t *out_len)
{
  const size_t out_cap = vtenc_max_encoded_size8(values_len); /* output capacity */
  int rc;

  *out = (uint8_t *) malloc(out_cap * sizeof(uint8_t));
  if (*out == NULL) {
    fprintf(stderr, "allocation error\n");
    return 0;
  }

  rc = vtenc_encode8(handler, values, values_len, *out, out_cap);
  if (rc != VTENC_OK) {
    fprintf(stderr, "encode failed with code: %d\n", rc);
    return 0;
  }

  *out_len = vtenc_encoded_size(handler);

  return 1;
}

int decode(
  vtenc *handler,
  const uint8_t *in,
  size_t in_len,
  uint8_t **out,
  size_t out_len)
{
  int rc;

  *out = (uint8_t *) malloc(out_len * sizeof(uint8_t));
  if (*out == NULL) {
    fprintf(stderr, "allocation error\n");
    return 0;
  }

  rc = vtenc_decode8(handler, in, in_len, *out, out_len);
  if (rc != VTENC_OK) {
    fprintf(stderr, "decode failed with code: %d\n", rc);
    return 0;
  }

  return 1;
}

int main()
{
  const uint8_t values[] = {13, 14, 29, 39, 65, 80, 88, 106, 152, 154, 155, 177};
  const size_t values_len = sizeof(values) / sizeof(values[0]);
  uint8_t *enc_out=NULL, *dec_out=NULL;
  size_t enc_out_len;
  int rc = 0;
  vtenc *handler = vtenc_create();

  if (handler == NULL) {
    fprintf(stderr, "It failed to create the handler\n");
    return 1;
  }

  if (!encode(handler, values, values_len, &enc_out, &enc_out_len)) {
    rc = 1;
    goto free_and_return;
  }

  if (!decode(handler, enc_out, enc_out_len, &dec_out, values_len)) {
    rc = 1;
    goto free_and_return;
  }

  printf("Original list: %lu bytes\n", values_len);
  printf("Encoded stream: %lu bytes\n", enc_out_len);

  if (memcmp(values, dec_out, values_len) == 0) {
    printf("Decoded output matches original list\n");
  } else {
    printf("Decoded output different from original list\n");
    rc = 1;
  }

free_and_return:
  if (enc_out != NULL)
    free(enc_out);

  if (dec_out != NULL)
    free(dec_out);

  vtenc_destroy(handler);

  return rc;
}

Limitations

For now, this library has only been tested on Linux and MacOS systems. It shouldn't be too difficult to support Windows as well and I'm planing to do so in future releases.

Building

To compile the library, run make in root directory. That will generate the static and the shared libraries (.a and .so files), ready to be included in your own projects.

Tests

This library is well covered with unit tests and with a few little programs to test different data sets. To build both the programs and the unit tests, run make test in the root directory. The following executable files will be created:

  • tests/unit/unit_tests: invoke it to run all the unit tests.

  • tests/testrand.sh: script to test all the random sequences located on tests/data directory. For each file, it calls tests/testbinseq program, which tests the serialised list or set "end-to-end", by encoding and decoding it and then comparing the result with the original sequence.

  • tests/timestamps: program to test ts.txt, a text file with a big list of timestamps.

  • tests/gov2: program to test gov2.sorted file, which is part of the "Document identifier data set" created by D. Lemire.

Benchmarks

The tables shown in the Results section are the result of running the tests included in the integer-compression-benchmarks repository.

Development philosophy

This section describes the rules and guidelines that will be followed in the different phases of development. Specifically, it explains the versioning philosophy in the two well-differentiated stages: before and after version 1.0.0 (first stable version).

Versions 0.X.Y

The main focus at this stage is to find the right encoded format that maximises compression ratio, encoding speed and decoding speed. Ideally, the format should also be open to support new features in future versions.

Given a version 0.X.Y, changes on:

  • X means either breaking changes or new functionality.
  • Y means bug fixes or performance improvements.

Backward compatibility is not a priority here.

From version 1.0.0

Versions will follow SemVer scheme, i.e. given a version X.Y.Z, incrementing:

  • X means that the library introduces breaking changes.
  • Y means that the library adds new functionality with backward compatibility.
  • Z means that the library adds bug fixes or performance improvements.

Stability, security and backward compatibility will be a priority starting at version v1.0.0.

License

This code is licensed under MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].