Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → google-research → Rigl

google-research / Rigl

Licence: apache-2.0

End-to-end training of sparse deep neural networks with little-to-no performance loss.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning computer-vision neural-networks

Projects that are alternatives of or similar to Rigl

Coursera Deep Learning Specialization

Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks and Deep Learning; (ii) Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization; (iii) Structuring Machine Learning Projects; (iv) Convolutional Neural Networks; (v) Sequence Models

Stars: ✭ 188 (-12.96%)

Mutual labels: neural-networks

The author's officially unofficial PyTorch BigGAN implementation.

Stars: ✭ 2,459 (+1038.43%)

Mutual labels: neural-networks

Japanese Riichi Mahjong AI agent. (Feel free to extend this agent or develop your own agent)

Stars: ✭ 210 (-2.78%)

Mutual labels: neural-networks

Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Stars: ✭ 195 (-9.72%)

Mutual labels: neural-networks

Seismic Deeplearning

Deep Learning for Seismic Imaging and Interpretation

Stars: ✭ 198 (-8.33%)

Mutual labels: neural-networks

Unsupervisedscalablerepresentationlearningtimeseries

Unsupervised Scalable Representation Learning for Multivariate Time Series: Experiments

Stars: ✭ 205 (-5.09%)

Mutual labels: neural-networks

Automatic Speech Recognition

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)

Stars: ✭ 192 (-11.11%)

Mutual labels: neural-networks

AI-related tutorials. Access any of them for free → https://towardsai.net/editorial

Stars: ✭ 204 (-5.56%)

Mutual labels: neural-networks

RadIO is a library for data science research of computed tomography imaging

Stars: ✭ 198 (-8.33%)

Mutual labels: neural-networks

A library for prototyping realtime hand detection (bounding box), directly in the browser.

Stars: ✭ 2,531 (+1071.76%)

Mutual labels: neural-networks

Deep Learning With Python

Deep learning codes and projects using Python

Stars: ✭ 195 (-9.72%)

Mutual labels: neural-networks

✨Fast Coreference Resolution in spaCy with Neural Networks

Stars: ✭ 2,453 (+1035.65%)

Mutual labels: neural-networks

A clear, concise, simple yet powerful and efficient API for deep learning.

Stars: ✭ 2,322 (+975%)

Mutual labels: neural-networks

DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

Stars: ✭ 2,592 (+1100%)

Mutual labels: neural-networks

Bayesiandeeplearning Survey

Bayesian Deep Learning: A Survey

Stars: ✭ 214 (-0.93%)

Mutual labels: neural-networks

Neural Localization

Train an RL agent to localize actively (PyTorch)

Stars: ✭ 193 (-10.65%)

Mutual labels: neural-networks

🌲 Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.

Stars: ✭ 202 (-6.48%)

Mutual labels: neural-networks

18.S096 - Applications of Scientific Machine Learning

Stars: ✭ 216 (+0%)

Mutual labels: neural-networks

A PyTorch Library for Accelerating 3D Deep Learning Research

Stars: ✭ 2,794 (+1193.52%)

Mutual labels: neural-networks

Hummingbird compiles trained ML models into tensor computation for faster inference.

Stars: ✭ 2,704 (+1151.85%)

Mutual labels: neural-networks

View All Similar Projects ➔

Rigging the Lottery: Making All Tickets Winners

80% Sparse Resnet-50

Paper: https://arxiv.org/abs/1911.11134

15min Presentation [pml4dc] [icml]

ML Reproducibility Challenge 2020 report

Colabs for Calculating FLOPs of Sparse Models

Best Sparse Models

Parameters are float, so each parameter is represented with 4 bytes. Uniform sparsity distribution keeps first layer dense therefore have slightly larger size and parameters. ERK applies to all layers except for 99% sparse model, in which we set the first layer to be dense, since otherwise we observe much worse performance.

Extended Training Results

Performance of RigL increases significantly with extended training iterations. In this section we extend the training of sparse models by 5x. Note that sparse models require much less FLOPs per training iteration and therefore most of the extended trainings cost less FLOPs than baseline dense training.

Observing improving performance we wanted to understand where the performance of sparse networks saturates. Longest training we ran had 100x training length of the original 100 epoch ImageNet training. This training costs 5.8x of the original dense training FLOPS and the resulting 99% sparse Resnet-50 achieves an impressive 68.15% test accuracy (vs 5x training accuracy of 61.86%).

S. Distribution	Sparsity	Training FLOPs	Inference FLOPs	Model Size (Bytes)	Top-1 Acc	Ckpt
- (DENSE)	0	3.2e18	8.2e9	102.122	76.8	-
ERK	0.8	2.09x	0.42x	23.683	77.17	link
Uniform	0.8	1.14x	0.23x	23.685	76.71	link
ERK	0.9	1.23x	0.24x	13.499	76.42	link
Uniform	0.9	0.66x	0.13x	13.532	75.73	link
ERK	0.95	0.63x	0.12x	8.399	74.63	link
Uniform	0.95	0.42x	0.08x	8.433	73.22	link
ERK	0.965	0.45x	0.09x	6.904	72.77	link
Uniform	0.965	0.34x	0.07x	6.904	71.31	link
ERK	0.99	0.29x	0.05x	4.354	61.86	link
ERK	0.99	0.58x	0.05x	4.354	63.89	link
ERK	0.99	2.32x	0.05x	4.354	66.94	link
ERK	0.99	5.8x	0.05x	4.354	68.15	link

We also ran extended training runs with MobileNet-v1. Again training 100x more, we were not able saturate the performance. Training longer consistently achieved better results.

S. Distribution	Sparsity	Training FLOPs	Inference FLOPs	Model Size (Bytes)	Top-1 Acc	Ckpt
- (DENSE)	0	4.5e17	1.14e9	16.864	72.1	-
ERK	0.89	1.39x	0.21x	2.392	69.31	link
ERK	0.89	2.79x	0.21x	2.392	70.63	link
Uniform	0.89	1.25x	0.09x	2.392	69.28	link
Uniform	0.89	6.25x	0.09x	2.392	70.25	link
Uniform	0.89	12.5x	0.09x	2.392	70.59	link

1x Training Results

S. Distribution	Sparsity	Training FLOPs	Inference FLOPs	Model Size (Bytes)	Top-1 Acc	Ckpt
ERK	0.8	0.42x	0.42x	23.683	75.12	link
Uniform	0.8	0.23x	0.23x	23.685	74.60	link
ERK	0.9	0.24x	0.24x	13.499	73.07	link
Uniform	0.9	0.13x	0.13x	13.532	72.02	link

Evaluating checkpoints

Download the checkpoints and run the evaluation on ERK checkpoints with the following:

python imagenet_train_eval.py --mode=eval_once --output_dir=path/to/ckpt/folder \
    --eval_once_ckpt_prefix=model.ckpt-3200000 --use_folder_stub=False \
    --training_method=rigl --mask_init_method=erdos_renyi_kernel \
    --first_layer_sparsity=-1

When running checkpoints with uniform sparsity distribution use --mask_init_method=random and --first_layer_sparsity=0. Set --model_architecture=mobilenet_v1 when evaluating mobilenet checkpoints.

Sparse Training Algorithms

In this repository we implement following dynamic sparsity strategies:

SET: Implements Sparse Evalutionary Training (SET) which corresponds to replacing low magnitude connections randomly with new ones.
SNFS: Implements momentum based training without sparsity re-distribution:
RigL: Our method, RigL, removes a fraction of connections based on weight magnitudes and activates new ones using instantaneous gradient information.

And the following one-shot pruning algorithm:

SNIP: Single-shot Network Pruning based on connection sensitivity prunes the least salient connections before training.

We have code for following settings:

Imagenet2012: TPU compatible code with Resnet-50 and MobileNet-v1/v2.
CIFAR-10 with WideResNets.
MNIST with 2 layer fully connected network.

Setup

First clone this repo.

git clone https://github.com/google-research/rigl.git
cd rigl

We use Neurips 2019 MicroNet Challenge code for counting operations and size of our networks. Let's clone the google_research repo and add current folder to the python path.

git clone https://github.com/google-research/google-research.git
mv google-research/ google_research/
export PYTHONPATH=$PYTHONPATH:$PWD

Now we can run some tests. Following script creates a virtual environment and installs the necessary libraries. Finally, it runs few tests.

bash run.sh

We need to activate the virtual environment before running an experiment. With that, we are ready to run some trivial MNIST experiments.

source env/bin/activate

python rigl/mnist/mnist_train_eval.py

You can load and verify the performance of the Resnet-50 checkpoints like following.

python rigl/imagenet_resnet/imagenet_train_eval.py --mode=eval_once --training_method=baseline --eval_batch_size=100 --output_dir=/path/to/folder --eval_once_ckpt_prefix=s80_model.ckpt-1280000 --use_folder_stub=False

We use the Official TPU Code for loading ImageNet data. First clone the tensorflow/tpu repo and then add models/ folder to the python path.

git clone https://github.com/tensorflow/tpu.git
export PYTHONPATH=$PYTHONPATH:$PWD/tpu/models/

Other Implementations

Graphcore-TF-MNIST: with sparse matrix ops!
Pytorch implementation by Dyllan McCreary.
Micrograd-Pure Python: This is a toy example with pure python sparse implementation. Caution, very slow but fun.

Citation

@incollection{rigl,
 author = {Evci, Utku and Gale, Trevor and Menick, Jacob and Castro, Pablo Samuel and Elsen, Erich},
 booktitle = {Proceedings of Machine Learning and Systems 2020},
 pages = {471--481},
 title = {Rigging the Lottery: Making All Tickets Winners},
 year = {2020}
}

Disclaimer

This is not an official Google product.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 216

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗