All Projects → harshalmittal4 → Hypergradient_variants

harshalmittal4 / Hypergradient_variants

Licence: MIT license
Improved Hypergradient optimizers, providing better generalization and faster convergence.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Hypergradient variants

ML-Optimizers-JAX
Toy implementations of some popular ML optimizers using Python/JAX
Stars: ✭ 37 (+146.67%)
Mutual labels:  momentum, adam-optimizer, optimizers
AutoOpt
Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent
Stars: ✭ 44 (+193.33%)
Mutual labels:  momentum, learning-rate
flaxOptimizers
A collection of optimizers, some arcane others well known, for Flax.
Stars: ✭ 21 (+40%)
Mutual labels:  optimizers
HAR
Recognize one of six human activities such as standing, sitting, and walking using a Softmax Classifier trained on mobile phone sensor data.
Stars: ✭ 18 (+20%)
Mutual labels:  momentum
submodlib
Summarize Massive Datasets using Submodular Optimization
Stars: ✭ 36 (+140%)
Mutual labels:  optimizers
lookahead tensorflow
Lookahead optimizer ("Lookahead Optimizer: k steps forward, 1 step back") for tensorflow
Stars: ✭ 25 (+66.67%)
Mutual labels:  adam-optimizer
postcss-momentum-scrolling
PostCSS plugin add 'momentum' style scrolling behavior (-webkit-overflow-scrolling: touch) for elements with overflow (scroll, auto) on iOS
Stars: ✭ 69 (+360%)
Mutual labels:  momentum
MachineLearning
An easy neural network for Java!
Stars: ✭ 125 (+733.33%)
Mutual labels:  learning-rate
Radam
On the Variance of the Adaptive Learning Rate and Beyond
Stars: ✭ 2,442 (+16180%)
Mutual labels:  adam-optimizer
Adam-optimizer
Implemented Adam optimizer in python
Stars: ✭ 43 (+186.67%)
Mutual labels:  adam-optimizer
CS231n
PyTorch/Tensorflow solutions for Stanford's CS231n: "CNNs for Visual Recognition"
Stars: ✭ 47 (+213.33%)
Mutual labels:  adam-optimizer
haskell-vae
Learning about Haskell with Variational Autoencoders
Stars: ✭ 18 (+20%)
Mutual labels:  adam-optimizer
Ta
Technical Analysis Library using Pandas and Numpy
Stars: ✭ 2,649 (+17560%)
Mutual labels:  momentum
Nn
🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+38033.33%)
Mutual labels:  optimizers
keras-lookahead
Lookahead mechanism for optimizers in Keras.
Stars: ✭ 50 (+233.33%)
Mutual labels:  optimizers
transformer
Neutron: A pytorch based implementation of Transformer and its variants.
Stars: ✭ 60 (+300%)
Mutual labels:  optimizers
MMD-GAN
Improving MMD-GAN training with repulsive loss function
Stars: ✭ 82 (+446.67%)
Mutual labels:  learning-rate
scrollbox
A lightweight jQuery custom scrollbar plugin, that triggers event when reached the defined point.
Stars: ✭ 15 (+0%)
Mutual labels:  momentum
axon
Nx-powered Neural Networks
Stars: ✭ 1,170 (+7700%)
Mutual labels:  optimizers
pytorch-lr-scheduler
PyTorch implementation of some learning rate schedulers for deep learning researcher.
Stars: ✭ 65 (+333.33%)
Mutual labels:  learning-rate

Hypergradient based Optimization Methods

This work tries improvements to the existing 'Hypergradient' based optimizers proposed in the paper Online Learning Rate Adaptation with Hypergradient Descent. The report summarises the work and can be found here.

Introduction

The method proposed in the paper "Online Learning Rate Adaptation with Hypergradient Descent" automatically adjusts the learning rate to minimize some estimate of the expectation of the loss, by introducing the “hypergradient” - the gradient of any loss function w.r.t hyperparameter “eta” (the optimizer learning rate). It learns the step-size via an update from gradient descent of the hypergradient at each training iteration, and uses it alongside the model optimizers SGD, SGD with Nesterov (SGDN) and Adam resulting in their hypergradient counterparts SGD-HD, SGDN-HD and Adam-HD, which demonstrate faster convergence of the loss and better generalization than solely using the original (plain) optimizers.

But we expect that the hypergradient based learning rate update could be more accurate and aim to exploit the gains much better by boosting the learning rate updates with momentum and adaptive gradients, experimenting with

  1. Hypergradient descent with momentum, and
  2. Adam with Hypergradient,

alongside the model optimizers SGD, SGD with Nesterov(SGDN) and Adam.

The naming convention used is: {model optimizer}op-{learning rate optimizer}lop, following which we have {model optimizer}op-SGDNlop (when the l.r. optimizer is hypergradient descent with momentum) and {model optimizer}op-Adamlop (when the l.r. optimizer is adam with hypergradient).

The new optimizers and the respective hypergradient-descent baselines from which their performance are compared are given as

  • SGDop-SGDNlop, with baseline SGD-HD (i.e. SGDop-SGDlop)
  • SGDNop-SGDNlop, with baseline SGDN-HD (i.e SGDNop-SGDlop)
  • Adamop-Adamlop, with baseline Adam-HD (i.e Adamop-SGDlop)

The optimizers provide the following advantages when evaluated against their hypergradient-descent baselines: Better generalization, Faster convergence, Better training stability (less sensitive to the initial chosen learning rate).

Motivation

The alpha_0 (initial learning rate) and beta (hypergradient l.r) configurations for the new optimizers are kept the same as the respective baselines from the paper (see run.sh for details). The results show that the new optimizers perform better for all the three models (VGGNet, LogReg, MLP). More description about the optimizers can be found in the project report here.

Behavior of the optimizers compared with their hypergradient-descent baselines.

Columns: left: logistic regression on MNIST; middle: multi-layer neural network on MNIST; right: VGG Net on CIFAR-10.

Project Structure

The project is organised as follows:

.
├── hypergrad/
│   ├── __init__.py 
│   ├── sgd_Hd.py  # model op. sgd, l.r. optimizer Hypergadient-descent (original)
│   └── adam_Hd.py #model op. adam, l.r. optimizer Hypergadient-descent (original)
├── op_sgd_lop_sgdn.py # model op. sgd, l.r. optimizer Hypergadient-descent with momentum
├── op_sgd_lop_adam.py # model op. sgd, l.r. optimizer Adam with Hypergadient 
├── op_adam_lop_sgdn.py # model op. adam, l.r. optimizer Hypergadient-descent with momentum
├── op_adam_lop_adam.py # model op. adam, l.r. optimizer Adam with Hypergadient
├── vgg.py
├── train.py
├── test/ # results of the experiments
├── plot_src/
├── plots/ # Experiment plots
├── run_.sh # to run the experiments
.
folders and files below will be generated after running the experiments
.
├── {model}_{optimizer}_{beta}_epochs{X}.pth           # Model checkpoint
└── test/{model}/{alpha}_{beta}/{optimizer}.csv        # Experiment results

Experiments

The experiment configurations (hyperparameters alpha_0 and beta) are defined in run.sh for the optimizers and three model classes. The experiments for the new optimizers are run following the same settings as their Hypergradient-descent versions: Logreg (20 epochs on MNIST), MLP (100 epochs on MNIST) and VGGNet (200 epochs on CIFAR-10).

References

  1. Hypergradient Descent (Github repository)

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].