All Projects → lemire → Javafastpfor

lemire / Javafastpfor

Licence: apache-2.0
A simple integer compression library in Java

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Javafastpfor

Boxing
Android multi-media selector based on MVP mode.
Stars: ✭ 3,216 (+654.93%)
Mutual labels:  compression
Mango
mango fun framework
Stars: ✭ 343 (-19.48%)
Mutual labels:  compression
Zfp
Compressed numerical arrays that support high-speed random access
Stars: ✭ 384 (-9.86%)
Mutual labels:  compression
Zoonavigator
Web-based ZooKeeper UI / editor / browser
Stars: ✭ 326 (-23.47%)
Mutual labels:  compression
Caffe
Caffe for Sparse and Low-rank Deep Neural Networks
Stars: ✭ 339 (-20.42%)
Mutual labels:  compression
Zipfly
Writing large ZIP archives without memory inflation
Stars: ✭ 363 (-14.79%)
Mutual labels:  compression
Simdcompressionandintersection
A C++ library to compress and intersect sorted lists of integers using SIMD instructions
Stars: ✭ 289 (-32.16%)
Mutual labels:  compression
Zstd Jni
JNI binding for Zstd
Stars: ✭ 424 (-0.47%)
Mutual labels:  compression
Divans
Building better compression together
Stars: ✭ 337 (-20.89%)
Mutual labels:  compression
Zson
ZSON is a PostgreSQL extension for transparent JSONB compression
Stars: ✭ 385 (-9.62%)
Mutual labels:  compression
Xz
Pure golang package for reading and writing xz-compressed files
Stars: ✭ 330 (-22.54%)
Mutual labels:  compression
Compress Images
Minify size your images. Image compression with extension: jpg/jpeg, svg, png, gif. NodeJs
Stars: ✭ 331 (-22.3%)
Mutual labels:  compression
Libzip
A C library for reading, creating, and modifying zip archives.
Stars: ✭ 379 (-11.03%)
Mutual labels:  compression
Compress
Collection of compression related Go packages.
Stars: ✭ 319 (-25.12%)
Mutual labels:  compression
Lizard
Lizard (formerly LZ5) is an efficient compressor with very fast decompression. It achieves compression ratio that is comparable to zip/zlib and zstd/brotli (at low and medium compression levels) at decompression speed of 1000 MB/s and faster.
Stars: ✭ 408 (-4.23%)
Mutual labels:  compression
Tinify Nodejs
Node.js client for the Tinify API.
Stars: ✭ 299 (-29.81%)
Mutual labels:  compression
Kanzi Go
Lossless data compression in Go
Stars: ✭ 361 (-15.26%)
Mutual labels:  compression
Draco
Draco is a library for compressing and decompressing 3D geometric meshes and point clouds. It is intended to improve the storage and transmission of 3D graphics.
Stars: ✭ 4,611 (+982.39%)
Mutual labels:  compression
Httpteleport
Transfer 10Gbps http traffic over 1Gbps networks :)
Stars: ✭ 422 (-0.94%)
Mutual labels:  compression
Ewahboolarray
A compressed bitmap class in C++.
Stars: ✭ 381 (-10.56%)
Mutual labels:  compression

JavaFastPFOR: A simple integer compression library in Java

Build Status docs-badge Coverage Status Code Quality: Cpp

License

This code is released under the Apache License Version 2.0 http://www.apache.org/licenses/.

What does this do?

It is a library to compress and uncompress arrays of integers very fast. The assumption is that most (but not all) values in your array use much less than 32 bits, or that the gaps between the integers use much less than 32 bits. These sort of arrays often come up when using differential coding in databases and information retrieval (e.g., in inverted indexes or column stores).

Please note that random integers are not compressible, by this library or by any other means. If you ever had the means of systematically compressing random integers, you could compress any data source to nothing, by recursive application of your technique.

This library can decompress integers at a rate of over 1.2 billions per second (4.5 GB/s). It is significantly faster than generic codecs (such as Snappy, LZ4 and so on) when compressing arrays of integers.

The library is used in LinkedIn Pinot, a realtime distributed OLAP datastore. Part of this library has been integrated in Parquet (http://parquet.io/). A modified version of the library is included in the search engine Terrier (http://terrier.org/). This libary is used by ClueWeb Tools (https://github.com/lintool/clueweb). It is also used by Apache NiFi.

This library inspired a compression scheme used by Apache Lucene and Apache Lucene.NET (e.g., see http://lucene.apache.org/core/4_6_1/core/org/apache/lucene/util/PForDeltaDocIdSet.html ).

It is a java port of the fastpfor C++ library (https://github.com/lemire/FastPFor). There is also a Go port (https://github.com/reducedb/encoding). The C++ library is used by the zsearch engine (http://victorparmar.github.com/zsearch/) as well as in GMAP and GSNAP (http://research-pub.gene.com/gmap/).

Usage

Really simple usage:

        IntegratedIntCompressor iic = new IntegratedIntCompressor();
        int[] data = ... ; // to be compressed
        int[] compressed = iic.compress(data); // compressed array
        int[] recov = iic.uncompress(compressed); // equals to data

For more examples, see example.java or the examples folder.

JavaFastPFOR supports compressing and uncompressing data in chunks (e.g., see advancedExample in https://github.com/lemire/JavaFastPFOR/blob/master/example.java).

Some CODECs ("integrated codecs") assume that the integers are in sorted orders and use differential coding (they compress deltas). They can be found in the package me.lemire.integercompression.differential. Most others do not.

Maven central repository

Using this code in your own project is easy with maven, just add the following code in your pom.xml file:

<dependencies>
     <dependency>
     <groupId>me.lemire.integercompression</groupId>
     <artifactId>JavaFastPFOR</artifactId>
     <version>[0.1,)</version>
     </dependency>
 </dependencies>

Naturally, you should replace "version" by the version you desire.

You can also download JavaFastPFOR from the Maven central repository: http://repo1.maven.org/maven2/me/lemire/integercompression/JavaFastPFOR/

Why?

We found no library that implemented state-of-the-art integer coding techniques such as Binary Packing, NewPFD, OptPFD, Variable Byte, Simple 9 and so on in Java. We wrote one.

Thread safety

Some codecs are thread-safe while others are not. For this reason, it is best to use one codec per thread. The memory usage of a codec instance is small in any case.

Nevertheless, if you want to reuse codec instances, note that by convention, unless the documentation of a codec specify that it is not thread-safe, then it can be assumed to be thread-safe.

Authors

Main contributors

with contributions by

How does it compare to the Kamikaze PForDelta library?

In our tests, Kamikaze PForDelta is slower than our implementations. See the benchmarkresults directory for some results.

https://github.com/lemire/JavaFastPFOR/blob/master/benchmarkresults/benchmarkresults_icore7_10may2013.txt

Reference: http://sna-projects.com/kamikaze/

Requirements

A recent Java compiler. Java 7 or better is recommended.

Good instructions on installing Java 7 on Linux:

http://forums.linuxmint.com/viewtopic.php?f=42&t=93052

How fast is it?

Compile the code and execute me.lemire.integercompression.benchmarktools.Benchmark.

I recommend running all the benchmarks with the "-server" flag on a desktop machine.

Speed is always reported in millions of integers per second.

For Maven users

mvn compile

mvn exec:java

For ant users

If you use Apache ant, please try this:

$ ant Benchmark

or:

$ ant Benchmark -Dbenchmark.target=BenchmarkBitPacking

API Documentation

http://www.javadoc.io/doc/me.lemire.integercompression/JavaFastPFOR/

Want to read more?

This library was a key ingredient in the best paper at ECIR 2014 :

Matteo Catena, Craig Macdonald, Iadh Ounis, On Inverted Index Compression for Search Engine Efficiency, Lecture Notes in Computer Science 8416 (ECIR 2014), 2014. http://dx.doi.org/10.1007/978-3-319-06028-6_30

We wrote several research papers documenting many of the CODECs implemented here:

Ikhtear Sharif wrote his M.Sc. thesis on this library:

Ikhtear Sharif, Performance Evaluation of Fast Integer Compression Techniques Over Tables, M.Sc. thesis, UNB 2013. http://lemire.me/fr/documents/thesis/IkhtearThesis.pdf

He also posted his slides online: http://www.slideshare.net/ikhtearSharif/ikhtear-defense

Other recommended libraries

Funding

This work was supported by NSERC grant number 26143.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].