All Projects → TheNilesh → huffman

TheNilesh / huffman

Licence: other
Using huffman coding to compress-decompress real-world files

Programming Languages

c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to huffman

Huffman-Coding
A C++ compression program based on Huffman's lossless compression algorithm and decoder.
Stars: ✭ 81 (+200%)
Mutual labels:  compression, huffman-algorithm
zip-bucket
zips files in a Google Cloud Storage [tm] bucket
Stars: ✭ 32 (+18.52%)
Mutual labels:  compression
x-compressor
x – minimalist data compressor
Stars: ✭ 42 (+55.56%)
Mutual labels:  compression
django-brotli
Django middleware that compresses response using brotli algorithm.
Stars: ✭ 16 (-40.74%)
Mutual labels:  compression
box
Box - Open Standard Archive Format, a zip killer.
Stars: ✭ 38 (+40.74%)
Mutual labels:  compression
ngx-image-compress
Angular library for uploading and compressing images
Stars: ✭ 65 (+140.74%)
Mutual labels:  compression
LittleBit
LittleBit is a pure Huffman coding compression algorithm with the option of random access reading while offering competitive compression ratios.
Stars: ✭ 13 (-51.85%)
Mutual labels:  compression
naf
Nucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences
Stars: ✭ 35 (+29.63%)
Mutual labels:  compression
GI
Sequitur and RePair grammar induction algorithms implementation
Stars: ✭ 20 (-25.93%)
Mutual labels:  compression
react-native-compressor
The lightweight library for compress image, video, and audio with an awesome experience
Stars: ✭ 157 (+481.48%)
Mutual labels:  compression
client-compress
A JavaScript based in-browser client-side image compression library
Stars: ✭ 32 (+18.52%)
Mutual labels:  compression
deflate-rs
An implementation of a DEFLATE encoder in rust
Stars: ✭ 47 (+74.07%)
Mutual labels:  compression
QmapCompression
Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021
Stars: ✭ 27 (+0%)
Mutual labels:  compression
rocketjob
Ruby's missing background and batch processing system
Stars: ✭ 281 (+940.74%)
Mutual labels:  compression
SSD-Pruning-and-quantization
Pruning and quantization for SSD. Model compression.
Stars: ✭ 19 (-29.63%)
Mutual labels:  compression
upx
Node.js cross-platform wrapper for UPX - the ultimate packer for eXecutables.
Stars: ✭ 27 (+0%)
Mutual labels:  compression
ndzip
A High-Throughput Parallel Lossless Compressor for Scientific Data
Stars: ✭ 19 (-29.63%)
Mutual labels:  compression
sanic compress
An extension which allows you to easily compress your Sanic responses with gzip.
Stars: ✭ 26 (-3.7%)
Mutual labels:  compression
exhal
Compression and decompression tools for NES, SNES, and Game Boy games by HAL Laboratory
Stars: ✭ 54 (+100%)
Mutual labels:  compression
ratarmount
Random Access Read-Only Tar Mount
Stars: ✭ 217 (+703.7%)
Mutual labels:  compression

Huffman Algorithm for File Compression

https://github.com/TheNilesh/huffman/
License: Public Domain, no warranty
Nilesh Akhade

About

Huffman Algorithm is an efficient way for file Compression and Decompression. This program exactly follows huffman algorithm. It reads frequent characters from input file and replace it with shorter binary codeword. The original file can be produced again without loosing any bit.

Usage

Compression:

	./encode <file to compress>

Output file named .hzip will be produced. Decompression:

	./decode <file to uncompress>

File Structure

N= total number of unique characters(1 byte)
Character[1 byte] Binary codeword String Form[MAX bytes]
Character[1 byte] Binary codeword String Form[MAX bytes]
N times
p (1 byte) p times 0's (p bits)
DATA

p = Padding done to ensure file fits in whole number of bytes. eg, file of 4 bytes + 3 bits must ne padded by 5 bits to make it 5 bytes.

Example

Text: aabcbaab

Content Comment
3 N=3 (a,b,c)
a "1" character and corresponding code "1"
b "01" character and corresponding code "01"
c "00" character and corresponding code "00"
4 Padding count
[0000] Padding 4 zeroes
[1] [1] [01] [00] [01] [1] [1] [01] Actual data, code in place of char

Algorithm

  1. (Pass 1) Read input file
  2. Create sorted linked list of characters from file, as per character frequency
    for eah character ch from file
    
     if( ch available in linked list at node p) then 
     {
     	p.freq++;
     	sort Linked list as per node's freq;
     }
     else
     	add new node at beginning of linked list with frequency=1;
    
  3. Construct huffman tree from linked list 0. Create new node q, join two least freq nodes to its left and right 0. Insert created node q into ascending list 0. Repeat i & ii till only one nodes remains, i.e, ROOT of h-tree 0. Traverse tree in preorder mark each node with its codeword. simultaneously Recreate linked list of leaf nodes.
  4. Write Mapping Table(character to codeword) to output file.
  5. (Pass 2) Read input file.
  6. Write codeword in place of each character in input file to output file for each character ch from input file write corresponding codeword into o/p file (lookup in mapping table OR linked list)
  7. End

Contributing

Please feel free to submit issues and pull requests. I appreciate bug reports. Testing on different platforms is especially appreciated. I only tested on Linux.

License

This software is in the Public Domain. That means you can do whatever you like with it. That includes being used in proprietary products without attribution or restrictions. There are no warranties and there may be bugs.

Formally we are using CC0 - a Creative Commons license to place this work in the public domain. A copy of CC0 is in the LICENSE file.

"CC0 is a public domain dedication from Creative Commons. A work released
under CC0 is dedicated to the public domain to the fullest extent permitted
by law. If that is not possible for any reason, CC0 also provides a lax,
permissive license as a fallback. Both public domain works and the lax
license provided by CC0 are compatible with the GNU GPL."
  - http://www.gnu.org/licenses/license-list.html#CC0

Development

To do:

  • Binary files, like jpeg,mp3 support
  • Run scan to group repeating bit patterns, not bit.
  • Unicode support
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].