All Projects → sayakpaul → Training-BatchNorm-and-Only-BatchNorm

sayakpaul / Training-BatchNorm-and-Only-BatchNorm

Licence: other
Experiments with the ideas presented in https://arxiv.org/abs/2003.00152 by Frankle et al.

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Training-BatchNorm-and-Only-BatchNorm

tensorflow-tabnet
Improved TabNet for TensorFlow
Stars: ✭ 49 (+113.04%)
Mutual labels:  tensorflow2
muzero
A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.
Stars: ✭ 126 (+447.83%)
Mutual labels:  tensorflow2
deep autoviml
Build tensorflow keras model pipelines in a single line of code. Now with mlflow tracking. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.
Stars: ✭ 98 (+326.09%)
Mutual labels:  tensorflow2
QuantumSpeech-QCNN
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition
Stars: ✭ 71 (+208.7%)
Mutual labels:  tensorflow2
mae-scalable-vision-learners
A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners
Stars: ✭ 54 (+134.78%)
Mutual labels:  tensorflow2
LoL-Match-Prediction
Win probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (+47.83%)
Mutual labels:  batch-normalization
Brain-Tumor-Segmentation
Attention-Guided Version of 2D UNet for Automatic Brain Tumor Segmentation
Stars: ✭ 125 (+443.48%)
Mutual labels:  tensorflow2
Autoregressive-models
Tensorflow 2.0 implementation of Deep Autoregressive Models
Stars: ✭ 18 (-21.74%)
Mutual labels:  tensorflow2
UnitBox
UnitBox: An Advanced Object Detection Network
Stars: ✭ 23 (+0%)
Mutual labels:  tensorflow2
caffe-mt
This is a fork of caffe added some useful layers, the original caffe site is https://github.com/BVLC/caffe.
Stars: ✭ 33 (+43.48%)
Mutual labels:  batch-normalization
Awesome-Tensorflow2
基于Tensorflow2开发的优秀扩展包及项目
Stars: ✭ 45 (+95.65%)
Mutual labels:  tensorflow2
datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (+130.43%)
Mutual labels:  tensorflow2
Tensorflow2-ObjectDetectionAPI-Colab-Hands-On
Tensorflow2 Object Detection APIのハンズオン用資料です(Hands-on documentation for the Tensorflow2 Object Detection API)
Stars: ✭ 33 (+43.48%)
Mutual labels:  tensorflow2
face-mask-detection-tf2
A face mask detection using ssd with simplified Mobilenet and RFB or Pelee in Tensorflow 2.1. Training on your own dataset. Can be converted to kmodel and run on the edge device of k210
Stars: ✭ 72 (+213.04%)
Mutual labels:  tensorflow2
spectral normalization-tf2
🌈 Spectral Normalization implemented as Tensorflow 2
Stars: ✭ 36 (+56.52%)
Mutual labels:  tensorflow2
labml
🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱
Stars: ✭ 1,213 (+5173.91%)
Mutual labels:  tensorflow2
transformer-tensorflow2.0
transformer in tensorflow 2.0
Stars: ✭ 53 (+130.43%)
Mutual labels:  tensorflow2
farm-animal-tracking
Farm Animal Tracking (FAT)
Stars: ✭ 19 (-17.39%)
Mutual labels:  tensorflow2
GrouProx
FedGroup, A Clustered Federated Learning framework based on Tensorflow
Stars: ✭ 20 (-13.04%)
Mutual labels:  tensorflow2
Tensorflow-YOLACT
Implementation of the paper "YOLACT Real-time Instance Segmentation" in Tensorflow 2
Stars: ✭ 97 (+321.74%)
Mutual labels:  tensorflow2

Training-BatchNorm-and-Only-BatchNorm

Experiments with the ideas presented in Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs by Frankle et al. In this paper, Frankle et al. explore the expressiveness of the random features in CNNs by starting with the following experimental setup:

  • They first set all the layers of a CNN to trainable=False.
  • Before they kickstart model training, they also set the Batch Norm layers to be trainable.

This simple experimental setup led to some pretty amazing discoveries on the expressive power of the randomly initialized layers in a CNN. So, the authors further explore the question - what if we only train the Batch Norm layers and lead this setup to a potential optimum? Their findings were pretty intriguing.

Dataset used

CIFAR10

Architecture used

ResNet20 (Thanks to the Keras Idiomatic Programmer repo)

About the files

  • CIFAR10_Subset.ipynb: Runs experiments on a GPU with a subset of the CIFAR10 dataset.
  • CIFAR10_Full.ipynb: Runs experiments on a GPU with the full CIFAR10 dataset.
  • CIFAR10_Full_TPU.ipynb: Runs experiments on a TPU with the full CIFAR10 dataset.
  • CIFAR10_Full_TPU_Different_LR_Schedules.ipynb: Runs experiments on a TPU with the full CIFAR10 dataset but with different learning rate schedules.
  • All_Layers_Frozen.ipynb: As the name suggests this notebook shows what happens when all the layers of a CNN is made non-trainable.
  • Varying_Batch_Sizes.ipynb: Runs experiments with varying batch sizes (only batch norm layer as trainable).
  • Visualization.ipynb: Visualizes the learned convolution filters of the networks.
  • Visualization_II.ipynb: Almost same as Visualization.ipynb with a bit different visualization plots.

Some interesting findings (of course credits to the authors)

Below is the output of the first trained convolution layer (all the layers were trained from scratch in this case)

Below is the output of the first trained convolution layer (this time only the Batch Norm layers were trained)

More results can be found here: https://app.wandb.ai/sayakpaul/training-bn-only. A more detailed report can be found here.

Important note

I trained both the variants of the networks for 75 epochs. Naturally, the one that contains only the BN layers as trainable ones would take longer to converge because of the number of parameters. But that can be used as a proxy to alleviate the problems of huge model size.

Acknowledgements

Although the notebooks are available as Colab-ready I trained all of them on a pre-configured AI Platform Notebook to make the experiments more reproducible. Thanks to the ML-GDE program program for the GCP Credits.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].