All Projects → YannDubs → lossyless

YannDubs / lossyless

Licence: MIT license
Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to lossyless

unishox js
JS Library for Guaranteed compression of Unicode short strings
Stars: ✭ 27 (-66.67%)
Mutual labels:  compression
ZstdKit
An Objective-C and Swift library for Zstd (Zstandard) compression and decompression.
Stars: ✭ 22 (-72.84%)
Mutual labels:  compression
MEGA Manager
Cloud syncing manager for multiple MEGA cloud storage accounts with syncing, data gathering, compresssion and optimization capabilities.
Stars: ✭ 29 (-64.2%)
Mutual labels:  compression
JOpenCTM
Java implementation of the openCTM Mesh compression file format
Stars: ✭ 13 (-83.95%)
Mutual labels:  compression
exponential-moving-average-normalization
PyTorch implementation of EMAN for self-supervised and semi-supervised learning: https://arxiv.org/abs/2101.08482
Stars: ✭ 76 (-6.17%)
Mutual labels:  self-supervised-learning
compbench
⌛ Benchmark and visualization of various compression algorithms
Stars: ✭ 21 (-74.07%)
Mutual labels:  compression
Oroch
A C++ library for integer array compression
Stars: ✭ 22 (-72.84%)
Mutual labels:  compression
zpaqfranz
Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
Stars: ✭ 86 (+6.17%)
Mutual labels:  compression
gzipped
Replacement for golang http.FileServer which supports precompressed static assets.
Stars: ✭ 86 (+6.17%)
Mutual labels:  compression
ViCC
[WACV'22] Code repository for the paper "Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting", https://arxiv.org/abs/2106.10137.
Stars: ✭ 33 (-59.26%)
Mutual labels:  self-supervised-learning
dsp
DSP and filtering library
Stars: ✭ 36 (-55.56%)
Mutual labels:  compression
ZetaProducerHtmlCompressor
A .NET port of Google’s HtmlCompressor library to minify HTML source code.
Stars: ✭ 31 (-61.73%)
Mutual labels:  compression
Lib.AspNetCore.WebSocketsCompression
[Archived] Lib.AspNetCore.WebSocketsCompression is a library which provides a managed implementation of the WebSocket protocol, along with server integration components and support for permessage-deflate compression.
Stars: ✭ 23 (-71.6%)
Mutual labels:  compression
mtscomp
Multichannel time series lossless compression in pure Python based on NumPy and zlib
Stars: ✭ 20 (-75.31%)
Mutual labels:  compression
DisCont
Code for the paper "DisCont: Self-Supervised Visual Attribute Disentanglement using Context Vectors".
Stars: ✭ 13 (-83.95%)
Mutual labels:  self-supervised-learning
Self-Supervised-GANs
Tensorflow Implementation for paper "self-supervised generative adversarial networks"
Stars: ✭ 34 (-58.02%)
Mutual labels:  self-supervised-learning
barlowtwins
Implementation of Barlow Twins paper
Stars: ✭ 84 (+3.7%)
Mutual labels:  self-supervised-learning
NBT
A java implementation of the NBT protocol, including a way to implement custom tags.
Stars: ✭ 128 (+58.02%)
Mutual labels:  compression
CLMR
Official PyTorch implementation of Contrastive Learning of Musical Representations
Stars: ✭ 216 (+166.67%)
Mutual labels:  self-supervised-learning
sesemi
supervised and semi-supervised image classification with self-supervision (Keras)
Stars: ✭ 43 (-46.91%)
Mutual labels:  self-supervised-learning

Lossy Compression for Lossless Prediction License: MIT Python 3.8+

Using: Using

Training: Training

This repostiory contains our implementation of the paper: Lossy Compression for Lossless Prediction. That formalizes and empirically inverstigates unsupervised training for task-specific compressors.

Using the compressor

Using

If you want to use our compressor directly the easiest is to use the model from torch hub as seen in the google colab (or notebooks/Hub.ipynb) or th example below.

Installation details
pip install torch torchvision tqdm numpy compressai sklearn git+https://github.com/openai/CLIP.git

Using pytorch>1.7.1 : CLIP forces pytorch version 1.7.1, this is because it needs this version to use JIT. If you don't need JIT (no JIT by default) you can alctually use more recent versions of torch and torchvision pip install -U torch torchvision. Make sure to update after having isntalled CLIP.


import time

import torch
from sklearn.svm import LinearSVC
from torchvision.datasets import STL10

DATA_DIR = "data/"

# list available compressors. b01 compresses the most (b01 > b005 > b001)
torch.hub.list('YannDubs/lossyless:main') 
# ['clip_compressor_b001', 'clip_compressor_b005', 'clip_compressor_b01']

# Load the desired compressor and transformation to apply to images (by default on GPU if available)
compressor, transform = torch.hub.load('YannDubs/lossyless:main','clip_compressor_b005')

# Load some data to compress and apply transformation
stl10_train = STL10(
    DATA_DIR, download=True, split="train", transform=transform
)
stl10_test = STL10(
    DATA_DIR, download=True, split="test", transform=transform
)

# Compresses the datasets and save them to file (this requires GPU)
# Rate: 1506.50 bits/img | Encoding: 347.82 img/sec
compressor.compress_dataset(
    stl10_train,
    f"{DATA_DIR}/stl10_train_Z.bin",
    label_file=f"{DATA_DIR}/stl10_train_Y.npy",
)
compressor.compress_dataset(
    stl10_test,
    f"{DATA_DIR}/stl10_test_Z.bin",
    label_file=f"{DATA_DIR}/stl10_test_Y.npy",
)

# Load and decompress the datasets from file the datasets (does not require GPU)
# Decoding: 1062.38 img/sec
Z_train, Y_train = compressor.decompress_dataset(
    f"{DATA_DIR}/stl10_train_Z.bin", label_file=f"{DATA_DIR}/stl10_train_Y.npy"
)
Z_test, Y_test = compressor.decompress_dataset(
    f"{DATA_DIR}/stl10_test_Z.bin", label_file=f"{DATA_DIR}/stl10_test_Y.npy"
)

# Downstream STL10 evaluation. Accuracy: 98.65% | Training time: 0.5 sec
clf = LinearSVC(C=7e-3)
start = time.time()
clf.fit(Z_train, Y_train)
delta_time = time.time() - start
acc = clf.score(Z_test, Y_test)
print(
    f"Downstream STL10 accuracy: {acc*100:.2f}%.  \t Training time: {delta_time:.1f} "
)

Minimal training code

Training

If your goal is to look at a minimal version of the code to simply understand what is going on, I would highly recommend starting from notebooks/minimal_compressor.ipynb (or google colab link above). This is a notebook version of the code provided in Appendix E.7. of the paper, to quickly train and evaluate our compressor.

Installation details
  1. pip install git+https://github.com/openai/CLIP.git
  2. pip uninstall -y torchtext (probably not necessary but can cause issues if got installed as wrong pytorch version)
  3. pip install scikit-learn==0.24.2 lightning-bolts==0.3.4 compressai==1.1.5 pytorch-lightning==1.3.8

Using pytorch>1.7.1 : CLIP forces pytorch version 1.7.1 you should be able to use a more recent versions. E.g.:

  1. pip install git+https://github.com/openai/CLIP.git
  2. pip install -U torch torchvision scikit-learn lightning-bolts compressai pytorch-lightning

Results from the paper

We provide scripts to essentially replicate some results from the paper. The exact results will be a little different as we simplified and cleaned some of the code to help readability. All scripts can be found in bin and run using the command bin/*/<experiment>.sh.

Installation details
  1. Clone repository
  2. Install PyTorch >= 1.7
  3. pip install -r requirements.txt

Other installation

  • For the bare minimum packages: use pip install -r requirements_mini.txt instead.
  • For conda: use conda env update --file requirements/environment.yaml.
  • For docker: we provide a dockerfile at requirements/Dockerfile.

Notes

  • CLIP forces pytorch version 1.7.1, this is because it needs this version to use JIT. We don't use JIT so you can alctually use more recent versions of torch and torchvision pip install -U torch torchvision.
  • For better logging: hydra and pytorch lightning logging don't work great together, to have a better logging experience you should comment out the folowing lines in pytorch_lightning/__init__.py :
if not _root_logger.hasHandlers():
     _logger.addHandler(logging.StreamHandler())
     _logger.propagate = False

Test installation

To test your installation and that everything works as desired you can run bin/test.sh, which will run an epoch of BICNE and VIC on MNIST.


Scripts details

All scripts can be found in bin and run using the command bin/*/<experiment>.sh. This will save all results, checkpoints, logs... The most important results (including summary resutls and figures) will be saved at results/exp_<experiment>. Most important are the summarized metrics results/exp_<experiment>*/summarized_metrics_merged.csv and any figures results/exp_<experiment>*/*.png.

The key experiments that that do not require very large compute are:

  • VIC/VAE on rotation invariant Banana distribution: bin/banana/banana_viz_VIC.sh
  • VIC/VAE on augmentation invariant MNIST: bin/mnist/augmist_viz_VIC.sh
  • CLIP experiments: bin/clip/main_linear.sh

By default all scripts will log results on weights and biases. If you have an account (or make one) you should set your username in conf/user.yaml after wandb_entity:, the passwod should be set directly in your environment variables. If you prefer not logging, you can use the command bin/*/<experiment>.sh -a logger=csv which changes (-a is for append) the default wandb logger to a csv logger.

Generally speaking you can change any of the parameters either directly in conf/**/<file>.yaml or by adding -a to the script. We are using Hydra to manage our configurations, refer to their documentation if something is unclear.

If you are using Slurm you can submit directly the script on servers by adding a config file under conf/slurm/<myserver>.yaml, and then running the script as bin/*/<experiment>.sh -s <myserver>. For example configurations files for slurm see conf/slurm/vector.yaml or conf/slurm/learnfair.yaml. For more information check the documentation from submitit's plugin which we are using.


VIC/VAE on rotation invariant Banana

Command:

bin/banana/banana_viz_VIC.sh

The following figures are saved automatically at results/exp_banana_viz_VIC/**/quantization.png. On the left we see the quantization of the Banana distribution by a standard compressor (called VAE in code but VC in paper). On the right, by our (rotation) invariant compressor (VIC).

Standard compression of Banana Invariant compression of Banana

VIC/VAE on augmentend MNIST

Command:

bin/banana/augmnist_viz_VIC.sh

The following figure is saved automatically at results/exp_augmnist_viz_VIC/**/rec_imgs.png. It shows source augmented MNIST images as well as the reconstructions using our invariant compressor.

Invariant compression of augmented MNIST

CLIP compressor

Command:

bin/clip/main_small.sh

The following table comes directly from the results which are automatically saved at results/exp_clip_bottleneck_linear_eval/**/datapred_*/**/results_predictor.csv. It shows the result of compression from our CLIP compressor on many datasets.

Cars196 STL10 Caltech101 Food101 PCam Pets37 CIFAR10 CIFAR100
Rate [bits] 1471 1342 1340 1266 1491 1209 1407 1413
Test Acc. [%] 80.3 98.5 93.3 83.8 81.1 88.8 94.6 79.0

Note: ImageNet is too large for training a SVM using SKlearn. You need to run MLP evaluation with bin/clip/clip_bottleneck_mlp_eval. Also you have to download ImageNet manually.

Cite

You can read the full paper here. Please cite our paper if you use our model:

@inproceedings{
    dubois2021lossy,
    title={Lossy Compression for Lossless Prediction},
    author={Yann Dubois and Benjamin Bloem-Reddy and Karen Ullrich and Chris J. Maddison},
    booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
    year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].