All Projects → lemire → FrameOfReference

lemire / FrameOfReference

Licence: Apache-2.0 license
C++ library to pack and unpack vectors of integers having a small range of values using a technique called Frame of Reference

Programming Languages

c
50402 projects - #5 most used programming language
C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to FrameOfReference

Croaring
Roaring bitmaps in C (and C++)
Stars: ✭ 735 (+1941.67%)
Mutual labels:  visual-studio, gcc, clang
Pfr
std::tuple like methods for user defined types without any macro or boilerplate code
Stars: ✭ 896 (+2388.89%)
Mutual labels:  visual-studio, gcc, clang
ci playground
Playground for Cloud CI development for C++
Stars: ✭ 23 (-36.11%)
Mutual labels:  visual-studio, gcc, clang
Moderncppci
This is an example of doing a Modern C++ project with CI
Stars: ✭ 109 (+202.78%)
Mutual labels:  visual-studio, gcc, clang
Vector
➿ A supercharged std::vector implementation (minus Allocator)
Stars: ✭ 118 (+227.78%)
Mutual labels:  visual-studio, gcc, clang
Boomerang
Boomerang Decompiler - Fighting the code-rot :)
Stars: ✭ 265 (+636.11%)
Mutual labels:  visual-studio, gcc, clang
Sol2
Sol3 (sol2 v3.0) - a C++ <-> Lua API wrapper with advanced features and top notch performance - is here, and it's great! Documentation:
Stars: ✭ 2,791 (+7652.78%)
Mutual labels:  visual-studio, gcc, clang
Ci helloworld
A simple example of how to setup a complete CI environment for C and C++
Stars: ✭ 357 (+891.67%)
Mutual labels:  visual-studio, gcc, clang
Oroch
A C++ library for integer array compression
Stars: ✭ 22 (-38.89%)
Mutual labels:  compression, integer-compression
clangbuilder
Building Clang ♡ Utility and Environment
Stars: ✭ 101 (+180.56%)
Mutual labels:  visual-studio, clang
Cmake Scripts
A selection of useful scripts for use in CMake projects, include code coverage, sanitizers, and dependency graph generation.
Stars: ✭ 202 (+461.11%)
Mutual labels:  gcc, clang
Efifs
EFI FileSystem drivers
Stars: ✭ 272 (+655.56%)
Mutual labels:  visual-studio, gcc
Uefi Ntfs
UEFI:NTFS - Boot NTFS partitions from UEFI
Stars: ✭ 386 (+972.22%)
Mutual labels:  visual-studio, gcc
VTEnc
VTEnc C library
Stars: ✭ 31 (-13.89%)
Mutual labels:  compression, integer-compression
FastIntegerCompression.js
Fast integer compression library in JavaScript
Stars: ✭ 46 (+27.78%)
Mutual labels:  compression, integer-compression
Llvm Vs2017 Integration
MSBuild 15.0 Toolset integration for multiple LLVM (From v5 to v8)
Stars: ✭ 84 (+133.33%)
Mutual labels:  visual-studio, clang
Polymcu
An open framework for micro-controller software
Stars: ✭ 173 (+380.56%)
Mutual labels:  gcc, clang
Fixed point
C++ Binary Fixed-Point Arithmetic
Stars: ✭ 199 (+452.78%)
Mutual labels:  gcc, clang
Stdex
std C++ 11 library impementation with extra features using only C++ 98 and POSIX threads
Stars: ✭ 43 (+19.44%)
Mutual labels:  visual-studio, gcc
Compilescore
Visual Studio extension and standalone app for build times and compilation data visualization.
Stars: ✭ 135 (+275%)
Mutual labels:  visual-studio, clang

Frame of Reference (FOR) C++ library

Build Status

What is this?

C++ library to pack and unpack vectors of integers having a small range of values using a technique called Frame of Reference (Goldstein et al. 1998). It should run fast even though it is written in simple C++.

Code from this library is part Apache Arrow and Apache Impala.

Code usage :

Given an array of 32-bit integers, you can compress it as follows:

#include "compression.h"

...

uint32_t * inputdata = ... // length values
uint32_t * compresseddata = ... // enough data
uint32_t *out = compress(inputdata, length, compresseddata);
// compressed data lies between compresseddata and out
uint32_t nvalue = 0;
uint32_t * recoverydata = ... // available buffer with at least length elements
uncompress(compresseddata, recoverydata, nvalue);
// nvalue will be equal to length

There is a similar API with turbocompress and turbouncompress with the difference that compresseddata uses an uint8_t pointer type.

#include "turbocompression.h"

...

uint32_t * inputdata = ... // length values
uint8_t * compresseddata = ... // enough data
uint8_t *out = turbocompress(inputdata, length, compresseddata);
// compressed data lies between compresseddata and out
uint32_t nvalue = 0;
uint32_t * recoverydata = ... // available buffer with at least length elements
turbouncompress(compresseddata, recoverydata, nvalue);
// nvalue will be equal to length

We can also compress 64-bit arrays:

#include "turbocompression.h"

...

uint64_t * inputdata = ... // length values
uint8_t * compresseddata = ... // enough data
uint8_t *out = turbocompress64(inputdata, length, compresseddata);
// compressed data lies between compresseddata and out
uint32_t nvalue = 0;
uint64_t * recoverydata = ... // available buffer with at least length elements
turbouncompress64(compresseddata, recoverydata, nvalue);
// nvalue will be equal to length

Usage (with Makefile)

To run a simple benchmark, do

 make
 ./test sampledata.txt

where sampledata.txt is a text data file with one integer per line.

For a parallelized version, type

 make testmp
 ./testmp sampledata.txt

This requires OpenMP support however.

Building (with CMake under macOS and Linux)

You need to have cmake installed and available as a command.

 make release
 cd release
 cmake ..
 make
 make test

Building (Visual Studio under Windows)

We are assuming that you have a common Windows PC with at least Visual Studio 2015, and an x64 processor.

To build with at least Visual Studio 2015 from the command line:

  • Grab the FrameOfReference code from GitHub, e.g., by cloning it using GitHub Desktop.
  • Install CMake. When you install it, make sure to ask that cmake be made available from the command line.
  • Create a subdirectory within FrameOfReference, such as VisualStudio.
  • Using a shell, go to this newly created directory. For example, within GitHub Desktop, you can right-click on  FrameOfReference in your GitHub repository list, and select Open in Git Shell, then type cd VisualStudio in the newly created shell.
  • Type cmake -DCMAKE_GENERATOR_PLATFORM=x64 .. in the shell while in the VisualStudio repository.
  • This last command created a Visual Studio solution file in the newly created directory (e.g., FrameOfReference.sln). Open this file in Visual Studio. You should now be able to build the project and run the tests. For example, in the Solution Explorer window (available from the View menu), right-click ALL_BUILD and select Build. To test the code, still in the Solution Explorer window, select RUN_TESTS and select Build.

To build with at least Visual Studio 2017 directly in the IDE:

  • Grab the FrameOfReference code from GitHub, e.g., by cloning it using GitHub Desktop.
  • Select the Visual C++ tools for CMake optional component when installing the C++ Development Workload within Visual Studio.
  • Within Visual Studio use File > Open > Folder... to open the FrameOfReference folder.
  • Right click on CMakeLists.txt in the parent directory within Solution Explorer and select Build to build the project.
  • For testing, in the Standard toolbar, drop the Select Startup Item... menu and choose one of the tests. Run the test by pressing the button to the left of the dropdown.

Requirements:

This was tested with GNU G++ and clang++ After suitable adjustments, it should build under most C++ compilers.

Other relevant libraries

References

  • Daniel Lemire, Nathan Kurz, Christoph Rupp, Stream VByte: Faster Byte-Oriented Integer Compression, Information Processing Letters (to appear) https://arxiv.org/abs/1709.08990
  • Goldstein J, Ramakrishnan R, Shaft U. Compressing relations and indexes. Proceedings of the Fourteenth International Conference on Data Engineering, ICDE ’98, IEEE Computer Society: Washington, DC, USA, 1998; 370–379.
  • Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization, Software Practice & Experience 45 (1), 2015. http://arxiv.org/abs/1209.2137 http://onlinelibrary.wiley.com/doi/10.1002/spe.2203/abstract
  • Daniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, Software Practice & Experience 46 (6), 2016. http://arxiv.org/abs/1401.6399
  • Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/abs/1503.07387
  • Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen, A General SIMD-based Approach to Accelerating Compression Algorithms, ACM Transactions on Information Systems 33 (3), 2015. http://arxiv.org/abs/1502.01916
  • Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, Steven Swanson, An Experimental Study of Bitmap Compression vs. Inverted List Compression, SIGMOD 2017 http://db.ucsd.edu/wp-content/uploads/2017/03/sidm338-wangA.pdf
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].