All Projects → rwanwork → Re-Pair

rwanwork / Re-Pair

Licence: GPL-3.0 license
Offline Dictionary-based Compression (Re-Pair, Recursive Pairing)

Programming Languages

c
50402 projects - #5 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to Re-Pair

apultra
Free open-source compressor for apLib with 5-7% better ratios
Stars: ✭ 84 (+300%)
Mutual labels:  compression, compression-algorithm
brieflz
Small fast Lempel-Ziv compression library
Stars: ✭ 84 (+300%)
Mutual labels:  compression, compression-algorithm
blz4
Example of LZ4 compression with optimal parsing using BriefLZ algorithms
Stars: ✭ 24 (+14.29%)
Mutual labels:  compression, compression-algorithm
x-compressor
x – minimalist data compressor
Stars: ✭ 42 (+100%)
Mutual labels:  compression, compression-algorithm
image-comp-lib-rust
Image Compression Algorithm
Stars: ✭ 30 (+42.86%)
Mutual labels:  compression, compression-algorithm
Tris Webpack Boilerplate
A Webpack boilerplate for static websites that has all the necessary modern tools and optimizations built-in. Score a perfect 10/10 on performance.
Stars: ✭ 1,016 (+4738.1%)
Mutual labels:  compression, offline
salvador
A free, open-source compressor for the ZX0 format
Stars: ✭ 35 (+66.67%)
Mutual labels:  compression, compression-algorithm
django-brotli
Django middleware that compresses response using brotli algorithm.
Stars: ✭ 16 (-23.81%)
Mutual labels:  compression, compression-algorithm
lzbase62
LZ77(LZSS) based compression algorithm in base62 for JavaScript.
Stars: ✭ 38 (+80.95%)
Mutual labels:  compression, compression-algorithm
Lepton
Lepton is a tool and file format for losslessly compressing JPEGs by an average of 22%.
Stars: ✭ 4,918 (+23319.05%)
Mutual labels:  compression, compression-algorithm
Huffman-Coding
A C++ compression program based on Huffman's lossless compression algorithm and decoder.
Stars: ✭ 81 (+285.71%)
Mutual labels:  compression, compression-algorithm
lz4ultra
Optimal LZ4 compressor, that produces files that decompress faster while keeping the best compression ratio
Stars: ✭ 49 (+133.33%)
Mutual labels:  compression
localForage-cn
localForage中文仓库,localForage改进了离线存储,提供简洁健壮的API,包括 IndexedDB, WebSQL, 和 localStorage。
Stars: ✭ 201 (+857.14%)
Mutual labels:  offline
tableaunoir
An online blackboard 🖉 with fridge magnets 🌈🧲 for teaching, and making animations 🏃 and presentations ⎚.
Stars: ✭ 149 (+609.52%)
Mutual labels:  offline
xcdat
Fast compressed trie dictionary library
Stars: ✭ 51 (+142.86%)
Mutual labels:  compression
meditation-timer
🧘 Progressive web application for timing your meditations
Stars: ✭ 23 (+9.52%)
Mutual labels:  offline
FastIntegerCompression.js
Fast integer compression library in JavaScript
Stars: ✭ 46 (+119.05%)
Mutual labels:  compression
Turbo-Transpose
Transpose: SIMD Integer+Floating Point Compression Filter
Stars: ✭ 50 (+138.1%)
Mutual labels:  compression
GainedVAE
A Pytorch Implementation of a continuously rate adjustable learned image compression framework.
Stars: ✭ 43 (+104.76%)
Mutual labels:  compression
use-navigator-online
⚛️ React Hooks to detect when your browser is online/offline.
Stars: ✭ 23 (+9.52%)
Mutual labels:  offline

Re-Pair and Des-Pair

Introduction

Re-Pair is the name of the algorithm and the software which implements the recursive pairing algorithm. Its corresponding decompressor is Des-Pair.

The Re-Pair algorithm reduces a message by recursively pairing adjacent symbols and replacing these symbols with a new symbol. This continues until no pair of adjacent symbols occur twice. A dictionary of phrases (the phrase hierarchy) and a sequence of symbols are produced as output. Re-Pair operates off-line as it commences after the entire message has been read. For larger messages, fixed-sized blocks can be created.

About the Source Code

The source code is written in C and was originally compiled using v4.1.2 of gcc for Debian v4.0 (etch). It has been compiled and successfully run on both an Intel Pentium 4 CPU (32-bit) and an Intel Core 2 Duo CPU (64-bit). However, it makes no use of the advantages offered by 64-bit architecture.

In 2022, it could be compiled on an Ubuntu 22.04 (64-bit) system using gcc version 11.2.0, with neglible compiler warnings .

This main purpose of this software was for research. Therefore, additional checks and extraneous information has been added into the source code which, if removed, may have a small improvement on the execution time.

Compiling

The archive includes a CMakeLists.txt. To create an out-of-source build, create a build/ directory and run cmake version 3.5 or higher. A Makefile will be created. Then type make. For example, if you're in the source directory,

    mkdir build
    cd build
    cmake ..
    make

To compress a file, run it as: repair -i <filename>. Two outputs are produced: filename.prel and filename.seq. The first file is the phrase hierarchy (or prelude). It has been encoded using interpolative coding (see the citations below for more information). The second file is the sequence and is simply a file of 4-byte integers which map to the hierarchy, with zeroes ("0") marking the end of a block. An entropy coder (such as a minimum redundancy (Huffman)) could be used, which is NOT included with this archive. For example, previous work made use of the minimum-redundancy coder on Prof. Alistair Moffat's homepage to compress the sequence.

In order to decompress a file, run it as: despair -i <filename>. The decompressed file will have the same filename as the original, except with a .u suffix added.

Run either executable without any arguments to see the list of options.

Citing

The algorithm is originally described in:

    N. J. Larsson and A. Moffat, "Offline Dictionary-Based Compression".
    Proc. IEEE, 88(11)1722-1732, November 2000

Extensions so that Re-Pair can be used for phrase browsing are covered by:

    R. Wan. "Browsing and Searching Compressed Documents". PhD thesis,
    University of Melbourne, Australia, December 2003.

When citing this software, if Re-Pair is being used on character (i.e., single-byte) data without any alignment to word boundaries, please cite the Proc. IEEE journal.

However, if it is being used in the context of phrase browsing with the other software systems such as Pre-Pair (word-based pre-processing), Re-Merge (block merging), and Re-Phine (phrase browsing), then please cite the thesis. Re-Phine is not yet publicly available, so e-mail me if you would not mind an intermediate version.

Of course, providing the web site where you got this software (see below) would be great!

Contact

This software was implemented by me (Raymond Wan) for my PhD thesis at the University of Melbourne (under the supervision of Prof. Alistair Moffat). My contact details:

My homepage is here.

The latest version of Re-Pair can be downloaded from GitHub.

If you have any information about bugs, suggestions for the documentation or just have some general comments, feel free to contact me via e-mail or GitHub.

Version

Changes to this software were recorded in the file ChangeLog up until April 2007. Since moving the source code to GitHub on November 13, 2014, any changes are recorded in the repository's history.

Copyright and License

Copyright (C) 2003-2022 by Raymond Wan ([email protected])

Re-Pair / Des-Pair is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. Please see the accompanying file, COPYING.v3 for further details.

About This Repository

This GitHub repository was created from the original tarball on my homepage many years ago.

Raymond Wan
September 10, 2019
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].