Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → velvia → Compressed Vec

velvia / Compressed Vec

Licence: apache-2.0

SIMD Floating point and integer compressed vector library

Programming Languages

rust

11053 projects

Labels

compression simd

Projects that are alternatives of or similar to Compressed Vec

Turbo-Transpose

Transpose: SIMD Integer+Floating Point Compression Filter

Stars: ✭ 50 (+100%)

Mutual labels: compression, simd

Maskedvbyte

Fast decoder for VByte-compressed integers

Stars: ✭ 91 (+264%)

Mutual labels: simd, compression

Streamvbyte

Fast integer compression in C using the StreamVByte codec

Stars: ✭ 195 (+680%)

Mutual labels: simd, compression

ndzip

A High-Throughput Parallel Lossless Compressor for Scientific Data

Stars: ✭ 19 (-24%)

Mutual labels: compression, simd

Simdcompressionandintersection

A C++ library to compress and intersect sorted lists of integers using SIMD instructions

Stars: ✭ 289 (+1056%)

Mutual labels: simd, compression

Mango

mango fun framework

Stars: ✭ 343 (+1272%)

Mutual labels: simd, compression

Simdcomp

A simple C library for compressing lists of integers using binary packing

Stars: ✭ 331 (+1224%)

Mutual labels: simd, compression

Turbopfor Integer Compression

Fastest Integer Compression

Stars: ✭ 520 (+1980%)

Mutual labels: simd, compression

Lz4

Extremely Fast Compression algorithm

Stars: ✭ 6,623 (+26392%)

Mutual labels: compression

Gan Compression

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

Stars: ✭ 800 (+3100%)

Mutual labels: compression

Leanify

lightweight lossless file minifier/optimizer

Stars: ✭ 694 (+2676%)

Mutual labels: compression

Acl

Animation Compression Library

Stars: ✭ 716 (+2764%)

Mutual labels: compression

Xnnpack

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Stars: ✭ 808 (+3132%)

Mutual labels: simd

Wavideobox

秒级! 三行代码实现iOS视频压缩、变速、混音、合并、GIF水印、旋转、换音、裁剪 ! 支持不同分辩率，支持你能想到的各种混合操作! 扩展性强...更多功能不断增加中... iOS 8.0 + 有需要的功能或错误欢迎issue，笔者会及时更新

Stars: ✭ 707 (+2728%)

Mutual labels: compression

Edge

Extreme-scale Discontinuous Galerkin Environment (EDGE)

Stars: ✭ 18 (-28%)

Mutual labels: simd

Bareos

Main repository with the code for the libraries and daemons

Stars: ✭ 651 (+2504%)

Mutual labels: compression

Linqfaster

Linq-like extension functions for Arrays, Span<T>, and List<T> that are faster and allocate less.

Stars: ✭ 615 (+2360%)

Mutual labels: simd

Simdsetoperations

testbed for different SIMD implementations for set intersection and set union

Stars: ✭ 24 (-4%)

Mutual labels: simd

Cglm

📽 Highly Optimized Graphics Math (glm) for C

Stars: ✭ 887 (+3448%)

Mutual labels: simd

Cgmath

A linear algebra and mathematics library for computer graphics.

Stars: ✭ 773 (+2992%)

Mutual labels: simd

View All Similar Projects ➔

compressed_vec

Floating point and integer compressed vector library, SIMD-enabled for fast processing/iteration over compressed representations.

This is a compressed vec library, rather than a compression library. What does that mean? A compression library takes some uncompressed data and provides essentially compress() and decompress() functions. Typically you have to decompress data to be able to do anything with it, resulting in extra latency and allocations.

On the other hand, this compressed vec library allows you to iterate over and process the compressed representations directly. It is designed to balance fast iteration and SIMD processing/filtering, while compressing vectors to within 2x of the best columnar compression technology such as Apache Parquet, using techniques such as delta and XOR encoding. Applications:

Database engines
Large in-memory data processing
Games and other apps needing fast access to large quantities of FP vectors/matrices

Performance Numbers

Numbers are from my laptop: 2.9 GHz Core i9, 6/12 cores, 12MB L3, AVX2; from running cargo bench vector, which benchmarks a filter-and-count-matches operation directly on encoded/compressed vectors.

Vector type(s)	Elements/sec	Raw GBs per sec
u32 dense (no sparsity)	1.7 Gelems/s	6.8 GB/s
u32 sparse (99% zeros)	13.9 Gelems/s	55.6 GB/s
Two u32 vectors (sparse + dense)*	1.3-5.2 Gelems/s	5-20 GB/s
u64 vector, dense	955M - 1.1 Gelems/s	7.6 - 9.1 GB/s
f32, XOR encoded, 60% density	985 Melems/s	3.9 GB/s

The two u32 vector filtering speed (using MultiVectorFilter) depends on the order of the vectors. It is faster to filter the sparse vector first.

Creation, Iteration

To create an f32 compressed vector:

    use compressed_vec::VectorF32XorAppender;
    let mut appender = VectorF32XorAppender::try_new(2048).unwrap();
    let encoded_bytes = appender.encode_all(vec![1.0, 1.5, 2.0, 2.5]).unwrap();

The simplest way to iterate on this compressed vector (note this does not allocate at all):

    use compressed_vec::VectorReader;
    let reader = VectorReader::<f32>::try_new(&encoded_bytes[..]).unwrap();
    let sum = reader.iterate().sum::<f32>();   // Yay, no allocations!

Filtering and SIMD Processing

iterate() is the easiest API to go through individual elements of the compressed vector, but it is not the fastest. Fast data processing, such as done in the filter-and-count benchmarks above, are performed using Sinks, which are defined in the sink module. Sinks operate on a SIMD word at a time, and the sink API is designed for inlining.

For example, let's say that we want to add 2.5 to the f32 vector above, and then write out the results to a Vec<f32>. Internally, XOR encoding and decoding is performed (using a sink). The sinks can be stacked during decoding, for an almost entirely SIMD pipeline: - XorSink (used automatically for f32 decoding) - AddConstSink (to add 2.5, again done using SIMD) - VecSink (writes output to a normal Vec)

    use compressed_vec::{VectorReader, AddConstSink, VecSink};
    let reader = VectorReader::<f32>::try_new(&encoded_bytes[..]).unwrap();
    let mut vecsink = VecSink::<f32>::new();
    let mut addsink = AddConstSink::new(2.5f32, &mut vecsink);
    reader.decode_to_sink(&mut addsink).unwrap();
    println!("And the transformed vector is: {:?}", vecsink.vec);

Vector Format

Details of the vector format can be found here.

The vector format follows columnar compression techniques used throughout the big data and database world, and roughly follows the Google Procella paper with its custom Artus format:

Compression within 2x of ZSTD while operating directly on the data
- Compression for this format is within 2x of Parquet, but is written to allow filtering and operating on the data directly without needing a separate decompression step for the entire vector
Multi-pass encoding
- The VectorAppender collects min/max and other stats on the raw data and uses it to decide on the best encoding strategy (delta, etc.)
Exposing dictionary indices to query engine and aggressive pushdown to the data format
- The format is designed to filter over dictionary codes, which speeds up filtering
- The use of sections allows for many optimizations for filtering. For example, null sections and constant sections allow for very fast filter short-circuiting.

Collaboration

Please reach out to me to collaborate!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 25

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗