sayakpaul / Training-BatchNorm-and-Only-BatchNorm

Licence: other

Experiments with the ideas presented in https://arxiv.org/abs/2003.00152 by Frankle et al.

Programming Languages

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to Training-BatchNorm-and-Only-BatchNorm

tensorflow-tabnet

Improved TabNet for TensorFlow

Stars: ✭ 49 (+113.04%)

Mutual labels: tensorflow2

muzero

A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.

Stars: ✭ 126 (+447.83%)

Mutual labels: tensorflow2

deep autoviml

Build tensorflow keras model pipelines in a single line of code. Now with mlflow tracking. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

Stars: ✭ 98 (+326.09%)

Mutual labels: tensorflow2

QuantumSpeech-QCNN

IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

Stars: ✭ 71 (+208.7%)

Mutual labels: tensorflow2

mae-scalable-vision-learners

A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners

Stars: ✭ 54 (+134.78%)

Mutual labels: tensorflow2

LoL-Match-Prediction

Win probability predictions for League of Legends matches using neural networks

Stars: ✭ 34 (+47.83%)

Mutual labels: batch-normalization

Brain-Tumor-Segmentation

Attention-Guided Version of 2D UNet for Automatic Brain Tumor Segmentation

Stars: ✭ 125 (+443.48%)

Mutual labels: tensorflow2

Autoregressive-models

Tensorflow 2.0 implementation of Deep Autoregressive Models

Stars: ✭ 18 (-21.74%)

Mutual labels: tensorflow2

UnitBox

UnitBox: An Advanced Object Detection Network

Stars: ✭ 23 (+0%)

Mutual labels: tensorflow2

caffe-mt

This is a fork of caffe added some useful layers, the original caffe site is https://github.com/BVLC/caffe.

Stars: ✭ 33 (+43.48%)

Mutual labels: batch-normalization

Awesome-Tensorflow2

基于Tensorflow2开发的优秀扩展包及项目

Stars: ✭ 45 (+95.65%)

Mutual labels: tensorflow2

datascienv

datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

Stars: ✭ 53 (+130.43%)

Mutual labels: tensorflow2

Tensorflow2-ObjectDetectionAPI-Colab-Hands-On

Tensorflow2 Object Detection APIのハンズオン用資料です(Hands-on documentation for the Tensorflow2 Object Detection API)

Stars: ✭ 33 (+43.48%)

Mutual labels: tensorflow2

face-mask-detection-tf2

A face mask detection using ssd with simplified Mobilenet and RFB or Pelee in Tensorflow 2.1. Training on your own dataset. Can be converted to kmodel and run on the edge device of k210

Stars: ✭ 72 (+213.04%)

Mutual labels: tensorflow2

spectral normalization-tf2

🌈 Spectral Normalization implemented as Tensorflow 2

Stars: ✭ 36 (+56.52%)

Mutual labels: tensorflow2

labml

🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Stars: ✭ 1,213 (+5173.91%)

Mutual labels: tensorflow2

transformer-tensorflow2.0

transformer in tensorflow 2.0

Stars: ✭ 53 (+130.43%)

Mutual labels: tensorflow2

farm-animal-tracking

Farm Animal Tracking (FAT)

Stars: ✭ 19 (-17.39%)

Mutual labels: tensorflow2

GrouProx

FedGroup, A Clustered Federated Learning framework based on Tensorflow

Stars: ✭ 20 (-13.04%)

Mutual labels: tensorflow2

Tensorflow-YOLACT

Implementation of the paper "YOLACT Real-time Instance Segmentation" in Tensorflow 2

Stars: ✭ 97 (+321.74%)

Mutual labels: tensorflow2

View All Similar Projects ➔

Training-BatchNorm-and-Only-BatchNorm

Experiments with the ideas presented in Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs by Frankle et al. In this paper, Frankle et al. explore the expressiveness of the random features in CNNs by starting with the following experimental setup:

They first set all the layers of a CNN to trainable=False.
Before they kickstart model training, they also set the Batch Norm layers to be trainable.

This simple experimental setup led to some pretty amazing discoveries on the expressive power of the randomly initialized layers in a CNN. So, the authors further explore the question - what if we only train the Batch Norm layers and lead this setup to a potential optimum? Their findings were pretty intriguing.

Dataset used

CIFAR10

Architecture used

ResNet20 (Thanks to the Keras Idiomatic Programmer repo)

About the files

CIFAR10_Subset.ipynb: Runs experiments on a GPU with a subset of the CIFAR10 dataset.
CIFAR10_Full.ipynb: Runs experiments on a GPU with the full CIFAR10 dataset.
CIFAR10_Full_TPU.ipynb: Runs experiments on a TPU with the full CIFAR10 dataset.
CIFAR10_Full_TPU_Different_LR_Schedules.ipynb: Runs experiments on a TPU with the full CIFAR10 dataset but with different learning rate schedules.
All_Layers_Frozen.ipynb: As the name suggests this notebook shows what happens when all the layers of a CNN is made non-trainable.
Varying_Batch_Sizes.ipynb: Runs experiments with varying batch sizes (only batch norm layer as trainable).
Visualization.ipynb: Visualizes the learned convolution filters of the networks.
Visualization_II.ipynb: Almost same as Visualization.ipynb with a bit different visualization plots.

Some interesting findings (of course credits to the authors)

Below is the output of the first trained convolution layer (all the layers were trained from scratch in this case)

Below is the output of the first trained convolution layer (this time only the Batch Norm layers were trained)

More results can be found here: https://app.wandb.ai/sayakpaul/training-bn-only. A more detailed report can be found here.

Important note

I trained both the variants of the networks for 75 epochs. Naturally, the one that contains only the BN layers as trainable ones would take longer to converge because of the number of parameters. But that can be used as a proxy to alleviate the problems of huge model size.

Acknowledgements

Although the notebooks are available as Colab-ready I trained all of them on a pre-configured AI Platform Notebook to make the experiments more reproducible. Thanks to the ML-GDE program program for the GCP Credits.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

sayakpaul / Training-BatchNorm-and-Only-BatchNorm

Programming Languages

Labels

Projects that are alternatives of or similar to Training-BatchNorm-and-Only-BatchNorm

Training-BatchNorm-and-Only-BatchNorm

Dataset used

Architecture used

About the files

Some interesting findings (of course credits to the authors)

Important note

Acknowledgements