All Projects → davda54 → ada-hessian

davda54 / ada-hessian

Licence: MIT license
Easy-to-use AdaHessian optimizer (PyTorch)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ada-hessian

Pytorch A2c Ppo Acktr Gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Stars: ✭ 2,632 (+4361.02%)
Mutual labels:  hessian, second-order
Radam
On the Variance of the Adaptive Learning Rate and Beyond
Stars: ✭ 2,442 (+4038.98%)
Mutual labels:  optimizer, adam
Optimizers-for-Tensorflow
Adam, NAdam and AAdam optimizers
Stars: ✭ 20 (-66.1%)
Mutual labels:  optimizer, adam
Adahessian
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Stars: ✭ 114 (+93.22%)
Mutual labels:  optimizer, hessian
AshBF
Over-engineered Brainfuck optimizing compiler and interpreter
Stars: ✭ 14 (-76.27%)
Mutual labels:  optimizer
artificial-neural-variability-for-deep-learning
The PyTorch Implementation of Variable Optimizers/ Neural Variable Risk Minimization proposed in our Neural Computation paper: Artificial Neural Variability for Deep Learning: On overfitting, Noise Memorization, and Catastrophic Forgetting.
Stars: ✭ 34 (-42.37%)
Mutual labels:  optimizer
neth-proxy
Stratum <-> Stratum Proxy and optimizer for ethminer
Stars: ✭ 35 (-40.68%)
Mutual labels:  optimizer
Draftfast
A tool to automate and optimize DraftKings and FanDuel lineup construction.
Stars: ✭ 192 (+225.42%)
Mutual labels:  optimizer
adamwr
Implements https://arxiv.org/abs/1711.05101 AdamW optimizer, cosine learning rate scheduler and "Cyclical Learning Rates for Training Neural Networks" https://arxiv.org/abs/1506.01186 for PyTorch framework
Stars: ✭ 130 (+120.34%)
Mutual labels:  optimizer
Cleaner
The only storage saving app that actually works! :D
Stars: ✭ 27 (-54.24%)
Mutual labels:  optimizer
LAMB Optimizer TF
LAMB Optimizer for Large Batch Training (TensorFlow version)
Stars: ✭ 119 (+101.69%)
Mutual labels:  optimizer
horoscope
horoscope is an optimizer inspector for DBMS.
Stars: ✭ 34 (-42.37%)
Mutual labels:  optimizer
keras-gradient-accumulation
Gradient accumulation for Keras
Stars: ✭ 35 (-40.68%)
Mutual labels:  optimizer
EAGO.jl
A development environment for robust and global optimization
Stars: ✭ 106 (+79.66%)
Mutual labels:  optimizer
keras gradient noise
Add gradient noise to any Keras optimizer
Stars: ✭ 36 (-38.98%)
Mutual labels:  optimizer
soar-php
SQL optimizer and rewriter. - SQL 优化、重写器(辅助 SQL 调优)。
Stars: ✭ 140 (+137.29%)
Mutual labels:  optimizer
prediction gan
PyTorch Impl. of Prediction Optimizer (to stabilize GAN training)
Stars: ✭ 31 (-47.46%)
Mutual labels:  optimizer
XTR-Toolbox
🛠 Versatile tool to optimize Windows
Stars: ✭ 138 (+133.9%)
Mutual labels:  optimizer
hesaff-pytorch
PyTorch implementation of Hessian-Affine local feature detector
Stars: ✭ 21 (-64.41%)
Mutual labels:  hessian
Post-Tweaks
A post-installation batch script for Windows
Stars: ✭ 136 (+130.51%)
Mutual labels:  optimizer

AdaHessian 🚀

Unofficial implementation of the AdaHessian optimizer. Created as a drop-in replacement for any PyTorch optimizer – you only need to set create_graph=True in the backward() call and everything else should work 🥳

Our version supports multiple param_groups, distributed training, delayed Hessian updates and more precise approximation of the Hessian trace.

Usage

from ada_hessian import AdaHessian
...
model = YourModel()
optimizer = AdaHessian(model.parameters())
...
for input, output in data:
  optimizer.zero_grad()
  loss = loss_function(output, model(input))
  loss.backward(create_graph=True)  # this is the important line! 🧐
  optimizer.step()
...

Documentation

AdaHessian.__init__

Argument Description
params (iterable) iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) learning rate (default: 0.1)
betas((float, float), optional) coefficients used for computing running averages of gradient and the squared hessian trace (default: (0.9, 0.999))
eps (float, optional) term added to the denominator to improve numerical stability (default: 1e-8)
weight_decay (float, optional) weight decay (L2 penalty) (default: 0.0)
hessian_power (float, optional) exponent of the hessian trace (default: 1.0)
update_each (int, optional) compute the hessian trace approximation only after this number of steps (to save time) (default: 1)
n_samples (int, optional) how many times to sample z for the approximation of the hessian trace (default: 1)
average_conv_kernel (bool, optional) average out the hessian traces of convolutional kernels as in the original paper (default: false)

AdaHessian.step

Performs a single optimization step.

Argument Description
closure (callable, optional) a closure that reevaluates the model and returns the loss (default: None)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].