All Projects → HolyBayes → pytorch_ard

HolyBayes / pytorch_ard

Licence: MIT license
Pytorch implementation of Variational Dropout Sparsifies Deep Neural Networks

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pytorch ard

SafeAI
Reusable, Easy-to-use Uncertainty module package built with Tensorflow, Keras
Stars: ✭ 13 (-82.89%)
Mutual labels:  bayesian-neural-networks
DUN
Code for "Depth Uncertainty in Neural Networks" (https://arxiv.org/abs/2006.08437)
Stars: ✭ 65 (-14.47%)
Mutual labels:  bayesian-neural-networks
BGCN
A Tensorflow implementation of "Bayesian Graph Convolutional Neural Networks" (AAAI 2019).
Stars: ✭ 129 (+69.74%)
Mutual labels:  bayesian-neural-networks
repository.membrane
Repository for my Kodi add-ons
Stars: ✭ 56 (-26.32%)
Mutual labels:  ard
R3ditor
An open-source project created to reverse-engineering some Resident Evil 3 files
Stars: ✭ 19 (-75%)
Mutual labels:  ard
mtv dl
MediathekView Downloader
Stars: ✭ 49 (-35.53%)
Mutual labels:  ard
spatial-smoothing
(ICML 2022) Official PyTorch implementation of “Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness”.
Stars: ✭ 68 (-10.53%)
Mutual labels:  bayesian-neural-networks
arduino-multi-relay
Arduino sketch to handle relays, primarily lights
Stars: ✭ 32 (-57.89%)
Mutual labels:  ard
SuperNNova
Open Source Photometric classification https://supernnova.readthedocs.io
Stars: ✭ 18 (-76.32%)
Mutual labels:  bayesian-neural-networks
ocbnn-public
General purpose library for BNNs, and implementation of OC-BNNs in our 2020 NeurIPS paper.
Stars: ✭ 31 (-59.21%)
Mutual labels:  bayesian-neural-networks
Dropouts
PyTorch Implementations of Dropout Variants
Stars: ✭ 72 (-5.26%)
Mutual labels:  bayesian-neural-networks
noisy-K-FAC
Natural Gradient, Variational Inference
Stars: ✭ 29 (-61.84%)
Mutual labels:  bayesian-neural-networks
BayesianNeuralNets
Bayesian neural networks in PyTorch
Stars: ✭ 17 (-77.63%)
Mutual labels:  bayesian-neural-networks
artificial neural networks
A collection of Methods and Models for various architectures of Artificial Neural Networks
Stars: ✭ 40 (-47.37%)
Mutual labels:  bayesian-neural-networks
UString
[ACM MM 2020] Uncertainty-based Traffic Accident Anticipation
Stars: ✭ 38 (-50%)
Mutual labels:  bayesian-neural-networks
BayesByHypernet
Code for the paper Implicit Weight Uncertainty in Neural Networks
Stars: ✭ 63 (-17.11%)
Mutual labels:  bayesian-neural-networks

Variational Dropout Sparsifies NN (Pytorch)

license PyPI version

Make your neural network 300 times faster!

Pytorch implementation of Variational Dropout Sparsifies Deep Neural Networks (arxiv:1701.05369).

Description

The discovered approach helps to train both convolutional and dense deep sparsified models without significant loss of quality. Additive Noise Reparameterization and the Local Reparameterization Trick discovered in the paper helps to eliminate weights prior's restrictions () and achieve Automatic Relevance Determination (ARD) effect on (typically most) network's parameters. According to the original paper, authors reduced the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy. Experiments with Boston dataset in this repository proves that: 99% of simple dense model were dropped using paper's ARD-prior without any significant loss of MSE. Moreover, this technique helps to significantly reduce overfitting and helps to not worry about model's complexity - all redundant parameters will be dropped automatically. Moreover, you can achieve any degree of regularization variating regularization factor tradeoff (see reg_factor variable in boston_ard.py and cifar_ard.py scripts)

Usage

import torch_ard as nn_ard
from torch import nn
import torch.nn.functional as F

input_size, hidden_size, output_size = 60, 150, 1

model = nn.Sequential(
    nn_ard.LinearARD(input_size, hidden_size),
    nn.ReLU(),
    nn_ard.LinearARD(hidden_size, output_size)
)


criterion = nn_ard.ELBOLoss(model, F.cross_entropy)
print('Sparsification ratio: %.3f%%' % (100.*nn_ard.get_dropped_params_ratio(model)))

# test stage
model.eval() # Needed for speed-up
model(input)

Installation

pip install git+https://github.com/HolyBayes/pytorch_ard

Experiments

All experiments are placed at examples folder and contains baseline and implemented models comparison.

Boston dataset

Two scripts were used in the experiment: boston_baseline.py and boston_ard.py. Training procedure for each experiment was 100000 epoches, Adam(lr=1e-3). Baseline model was dense neural network with single hidden layer with hidden size 150.

Baseline (nn.Linear) LinearARD, no reg LinearARD, reg=0.0001 LinearARD, reg=0.001 LinearARD, reg=0.1 LinearARD, reg=1
MSE (train) 1.751 1.626 1.587 1.962 17.167 33.682
MSE (test) 22.580 16.229 15.957 8.416 25.695 30.231
Compression, % 0 0.38 52.95 64.19 97.29 99.29

You can see on the table above that variating regularization factor any degree of compression can be achieved (for example, ~99.29% of connections can be dropped if reg_factor=1 will be used). Moreover, you can see that training with LinearARD layers with some regularization parameters (like reg=0.001 in the table above) not only significantly reduces number of model parameters (>64% of parameters can be dropped after training), but also significantly increases quality on test, reducing overfitting.

Tips

  1. Despite the high performance of implemented layers in "end-to-end" mode, authors recommends to use in fine-tuning pretrained models without ARD prior. In this case the best performance could be achieved. Moreover, it will be faster - despite of comparable convergence speed of this layers optimization, each training epoch takes more time (approx. twice longer - ~2 times more parameters in *ARD implementations). This fact well describable - using ARD prior in earlier stages can drop useful connections with unobvious dependencies.
  2. Model's sparsification takes almost no any speed-up effects until You convert it to the sparse one! (TODO)

Requirements

  • PyTorch >= 0.4.0
  • SkLearn >= 0.19.1
  • Pandas >= 0.23.3
  • Numpy >= 1.14.5

Authors

@article{molchanov2017variational,
  title={Variational Dropout Sparsifies Deep Neural Networks},
  author={Molchanov, Dmitry and Ashukha, Arsenii and Vetrov, Dmitry},
  journal={arXiv preprint arXiv:1701.05369},
  year={2017}
}

Original implementation (Theano/Lasagne)

Citation

@misc{pytorch_ard,
  author = {Artem Ryzhikov},
  title = {HolyBayes/pytorch_ard},
  url = {https://github.com/HolyBayes/pytorch_ard},
  year = {2018}
}

Contacts

Artem Ryzhikov, LAMBDA laboratory, Higher School of Economics, Yandex School of Data Analysis

E-mail: [email protected]

Linkedin: https://www.linkedin.com/in/artem-ryzhikov-2b6308103/

Link: https://www.hse.ru/org/persons/190912317

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].