All Projects → sayakpaul → Adaptive-Gradient-Clipping

sayakpaul / Adaptive-Gradient-Clipping

Licence: MIT license
Minimal implementation of adaptive gradient clipping (https://arxiv.org/abs/2102.06171) in TensorFlow 2.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Adaptive-Gradient-Clipping

QuantumSpeech-QCNN
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition
Stars: ✭ 71 (-4.05%)
Mutual labels:  colab-notebook, tensorflow2
Tensorflow2-ObjectDetectionAPI-Colab-Hands-On
Tensorflow2 Object Detection APIのハンズオン用資料です(Hands-on documentation for the Tensorflow2 Object Detection API)
Stars: ✭ 33 (-55.41%)
Mutual labels:  colab-notebook, tensorflow2
TFLite-ModelMaker-EfficientDet-Colab-Hands-On
TensorFlow Lite Model Makerで物体検出を行うハンズオン用資料です(Hands-on for object detection with TensorFlow Lite Model Maker)
Stars: ✭ 15 (-79.73%)
Mutual labels:  colab-notebook, tensorflow2
transformer
Build English-Vietnamese machine translation with ProtonX Transformer. :D
Stars: ✭ 41 (-44.59%)
Mutual labels:  tensorflow2
tensorflow-maml
TensorFlow 2.0 implementation of MAML.
Stars: ✭ 79 (+6.76%)
Mutual labels:  tensorflow2
3D-GuidedGradCAM-for-Medical-Imaging
This Repo containes the implemnetation of generating Guided-GradCAM for 3D medical Imaging using Nifti file in tensorflow 2.0. Different input files can be used in that case need to edit the input to the Guided-gradCAM model.
Stars: ✭ 60 (-18.92%)
Mutual labels:  tensorflow2
potato-disease-classification
Potato Disease Classification - Training, Rest APIs, and Frontend to test.
Stars: ✭ 95 (+28.38%)
Mutual labels:  tensorflow2
AiSpace
AiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0
Stars: ✭ 28 (-62.16%)
Mutual labels:  tensorflow2
Torrent-To-Google-Drive-Downloader
Simple notebook to stream torrent files to Google Drive using Google Colab and python3.
Stars: ✭ 256 (+245.95%)
Mutual labels:  colab-notebook
TensorFlow2.0 SSD
A tensorflow_2.0 implementation of SSD (Single Shot MultiBox Detector) .
Stars: ✭ 83 (+12.16%)
Mutual labels:  tensorflow2
latent space adventures
Buckle up, adventure in the styleGAN2-ada-pytorch network latent space awaits
Stars: ✭ 59 (-20.27%)
Mutual labels:  colab-notebook
E2E-Object-Detection-in-TFLite
This repository shows how to train a custom detection model with the TFOD API, optimize it with TFLite, and perform inference with the optimized model.
Stars: ✭ 28 (-62.16%)
Mutual labels:  tensorflow2
Open-Source-Models
Address book for computer vision models.
Stars: ✭ 30 (-59.46%)
Mutual labels:  tensorflow2
checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
Stars: ✭ 116 (+56.76%)
Mutual labels:  tensorflow2
gans-2.0
Generative Adversarial Networks in TensorFlow 2.0
Stars: ✭ 76 (+2.7%)
Mutual labels:  tensorflow2
pyradox
State of the Art Neural Networks for Deep Learning
Stars: ✭ 61 (-17.57%)
Mutual labels:  tensorflow2
MineColab
Run Minecraft Server on Google Colab.
Stars: ✭ 135 (+82.43%)
Mutual labels:  colab-notebook
Reinforcement-Learning-on-google-colab
Reinforcement Learning algorithm's using google-colab
Stars: ✭ 33 (-55.41%)
Mutual labels:  colab-notebook
manning tf2 in action
The official code repository for "TensorFlow in Action" by Manning.
Stars: ✭ 61 (-17.57%)
Mutual labels:  tensorflow2
practicals-2019
Practical notebooks for Khipu 2019, held in Universidad de la República in Montevideo.
Stars: ✭ 241 (+225.68%)
Mutual labels:  colab-notebook

Adaptive-Gradient-Clipping

This repository provides a minimal implementation of adaptive gradient clipping (AGC) (as proposed in High-Performance Large-Scale Image Recognition Without Normalization1) in TensorFlow 2. The paper attributes AGC as a crucial component in order to train deep neural networks without batch normalization2. Readers are encouraged to consult the paper to understand why one might want to train networks without batch normalization given its paramount success.

My goal with this repository is to be able to quickly train shallow networks with and without AGC. Therefore, I provide two Colab Notebooks which I discuss below.

About the notebooks

  • AGC.ipynb: Demonstrates training of a shallow network (only 0.002117 million parameters) with AGC. Open In Colab
  • BatchNorm.ipynb: Demonstrates training of a shallow network (only 0.002309 million parameters) with batch normalization. Open In Colab

Both of these notebooks are end-to-end executable on Google Colab. Furthermore, they utilize the free TPUs (TPUv2-8) Google Colab provides allowing readers to experiment very quickly.

Findings

Before moving to the findings, please be aware of the following things:

  • The network I have used in order to demonstrate the results is extremely shallow.
  • The network is a mini VGG3 style network whereas the original paper focuses on ResNet4 style architectures.
  • The dataset (flowers dataset) I experimented with consists of ~3500 samples.
  • I clipped gradients of all the layers whereas in the original paper final linear layer wasn't clipped (refer to Section 4.1 of the original paper).

By comparing the training progress of two networks (trained with and without AGC), we see that with AGC network training is more stabilized.

Batch Normalization AGC

In the table below, I summarize results of the two aforementioned notebooks -

Number of Parameters (million) Final Validation Accuracy (%) Training Time (seconds)
Batch Normalization 0.002309 54.67 2.7209
Adaptive Gradient Clipping 0.002117 52 2.6145

For these experiments, I used a batch size of 512 each batch having a shape of (512, 96, 96, 3) and a clipping factor of 0.01 (applicable only for AGC).

These results SHOULD NOT be treated as conclusive. For details related to training configuration (i.e. network depth, learning rate, etc.) please refer to the notebooks.

Citations

[1] Brock, Andrew, et al. “High-Performance Large-Scale Image Recognition Without Normalization.” ArXiv:2102.06171 [Cs, Stat], Feb. 2021. arXiv.org, http://arxiv.org/abs/2102.06171.

[2] Ioffe, Sergey, and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” ArXiv:1502.03167 [Cs], Mar. 2015. arXiv.org, http://arxiv.org/abs/1502.03167.

[3] Simonyan, Karen, and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” ArXiv:1409.1556 [Cs], Apr. 2015. arXiv.org, http://arxiv.org/abs/1409.1556.

[4] He, Kaiming, et al. “Deep Residual Learning for Image Recognition.” ArXiv:1512.03385 [Cs], Dec. 2015. arXiv.org, http://arxiv.org/abs/1512.03385.

Acknowledgements

I referred to the following resources during experimentation:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].