All Projects → majumderb → Rezero

majumderb / Rezero

Licence: mit
Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rezero

Real Time Gesrec
Real-time Hand Gesture Recognition with PyTorch on EgoGesture, NvGesture, Jester, Kinetics and UCF101
Stars: ✭ 339 (+6.94%)
Mutual labels:  deep-neural-networks, resnet
Deep Ranking
Learning Fine-grained Image Similarity with Deep Ranking is a novel application of neural networks, where the authors use a new multi scale architecture combined with a triplet loss to create a neural network that is able to perform image search. This repository is a simplified implementation of the same
Stars: ✭ 64 (-79.81%)
Mutual labels:  deep-neural-networks, resnet
Flow Forecast
Deep learning PyTorch library for time series forecasting, classification, and anomaly detection (originally for flood forecasting).
Stars: ✭ 368 (+16.09%)
Mutual labels:  deep-neural-networks, transformer
Bmw Tensorflow Training Gui
This repository allows you to get started with a gui based training a State-of-the-art Deep Learning model with little to no configuration needed! NoCode training with TensorFlow has never been so easy.
Stars: ✭ 736 (+132.18%)
Mutual labels:  deep-neural-networks, resnet
Paddlex
PaddlePaddle End-to-End Development Toolkit(『飞桨』深度学习全流程开发工具)
Stars: ✭ 3,399 (+972.24%)
Mutual labels:  deep-neural-networks, resnet
Eeg Dl
A Deep Learning library for EEG Tasks (Signals) Classification, based on TensorFlow.
Stars: ✭ 165 (-47.95%)
Mutual labels:  resnet, transformer
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+212.3%)
Mutual labels:  deep-neural-networks, transformer
Deep Ctr Prediction
CTR prediction models based on deep learning(基于深度学习的广告推荐CTR预估模型)
Stars: ✭ 628 (+98.11%)
Mutual labels:  resnet, transformer
Voice activity detection
Voice Activity Detection based on Deep Learning & TensorFlow
Stars: ✭ 132 (-58.36%)
Mutual labels:  deep-neural-networks, resnet
Tensorflow2.0 Examples
🙄 Difficult algorithm, Simple code.
Stars: ✭ 1,397 (+340.69%)
Mutual labels:  deep-neural-networks, resnet
Gluon2pytorch
Gluon to PyTorch deep neural network model converter
Stars: ✭ 70 (-77.92%)
Mutual labels:  deep-neural-networks, resnet
Octconv.pytorch
PyTorch implementation of Octave Convolution with pre-trained Oct-ResNet and Oct-MobileNet models
Stars: ✭ 229 (-27.76%)
Mutual labels:  deep-neural-networks, resnet
Iresnet
Improved Residual Networks (https://arxiv.org/pdf/2004.04989.pdf)
Stars: ✭ 163 (-48.58%)
Mutual labels:  deep-neural-networks, resnet
Dab
Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ
Stars: ✭ 294 (-7.26%)
Mutual labels:  deep-neural-networks, transformer
Deep Learning Uncertainty
Literature survey, paper reviews, experimental setups and a collection of implementations for baselines methods for predictive uncertainty estimation in deep learning models.
Stars: ✭ 296 (-6.62%)
Mutual labels:  deep-neural-networks
Tensorflow Image Detection
A generic image detection program that uses Google's Machine Learning library, Tensorflow and a pre-trained Deep Learning Convolutional Neural Network model called Inception.
Stars: ✭ 306 (-3.47%)
Mutual labels:  deep-neural-networks
Cascaded Fcn
Source code for the MICCAI 2016 Paper "Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional NeuralNetworks and 3D Conditional Random Fields"
Stars: ✭ 296 (-6.62%)
Mutual labels:  deep-neural-networks
Model Compression Papers
Papers for deep neural network compression and acceleration
Stars: ✭ 296 (-6.62%)
Mutual labels:  deep-neural-networks
Pytorch Vdsr
VDSR (CVPR2016) pytorch implementation
Stars: ✭ 313 (-1.26%)
Mutual labels:  deep-neural-networks
Deepxi
Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.
Stars: ✭ 304 (-4.1%)
Mutual labels:  resnet

ReZero for Deep Neural Networks

ReZero is All You Need: Fast Convergence at Large Depth; ArXiv, March 2020.

Thomas Bachlechner*, Bodhisattwa Prasad Majumder*, Huanru Henry Mao*, Garrison W. Cottrell, Julian McAuley (* denotes equal contributions)

This repository contains the ReZero-Transformer implementation from the paper. It matches Pytorch's Transformer and can be easily used as a drop-in replacement.

Quick Links:

Abstract

Deep networks have enabled significant performance gains across domains, but they often suffer from vanishing/exploding gradients. This is especially true for Transformer architectures where depth beyond 12 layers is difficult to train without large datasets and computational budgets. In general, we find that inefficient signal propagation impedes learning in deep networks. In Transformers, multi-head self-attention is the main cause of this poor signal propagation. To facilitate deep signal propagation, we propose ReZero, a simple change to the architecture that initializes an arbitrary layer as the identity map, using a single additional learned parameter per layer. We apply this technique to language modeling and find that we can easily train ReZero-Transformer networks over a hundred layers. When applied to 12 layer Transformers, ReZero converges 56% faster on enwiki8. ReZero applies beyond Transformers to other residual networks, enabling 1,500% faster convergence for deep fully connected networks and 32% faster convergence for a ResNet-56 trained on CIFAR 10.

Installation

Simply install from pip:

pip install rezero

Pytorch 1.4 or greater is required.

Usage

We provide custom ReZero Transformer layers (RZTX).

For example, this will create a Transformer encoder:

import torch
import torch.nn as nn
from rezero.transformer import RZTXEncoderLayer

encoder_layer = RZTXEncoderLayer(d_model=512, nhead=8)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
src = torch.rand(10, 32, 512)
out = transformer_encoder(src)

This will create a Transformer decoder:

import torch
import torch.nn as nn
from rezero.transformer import RZTXDecoderLayer

decoder_layer = RZTXDecoderLayer(d_model=512, nhead=8)
transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers=6)
memory = torch.rand(10, 32, 512)
tgt = torch.rand(20, 32, 512)
out = transformer_decoder(tgt, memory)

Make sure norm argument is left as None as to not use LayerNorm in the Transformer.

See https://pytorch.org/docs/master/nn.html#torch.nn.Transformer for details on how to integrate customer Transformer layers to Pytorch.

Tutorials

  1. Training 128 layer ReZero Transformer on WikiText-2 language modeling
  2. Training 10,000 layer ReZero neural network on CIFAR-10 data

Watch for more tutorials in this space.

Citation

If you find rezero useful for your research, please cite our paper:

@inproceedings{BacMajMaoCotMcA20,
    title = "ReZero is All You Need: Fast Convergence at Large Depth",
    author = "Bachlechner, Thomas  and
      Majumder, Bodhisattwa Prasad
      Mao, Huanru Henry and
      Cottrell, Garrison W. and
      McAuley, Julian",
    booktitle = "arXiv",
    year = "2020",
    url = "https://arxiv.org/abs/2003.04887"
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].