All Projects → amirgholami → Adahessian

amirgholami / Adahessian

Licence: mit
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Adahessian

ada-hessian
Easy-to-use AdaHessian optimizer (PyTorch)
Stars: ✭ 59 (-48.25%)
Mutual labels:  optimizer, hessian
Rxcache
A local reactive cache for Java and Android. Now, it supports heap memory、off-heap memory and disk cache.
Stars: ✭ 102 (-10.53%)
Mutual labels:  hessian
Robotics Toolbox Python
Robotics Toolbox for Python
Stars: ✭ 369 (+223.68%)
Mutual labels:  hessian
Giflossy
Merged into Gifsicle!
Stars: ✭ 937 (+721.93%)
Mutual labels:  optimizer
Scour
Scour - An SVG Optimizer / Cleaner
Stars: ✭ 443 (+288.6%)
Mutual labels:  optimizer
Jhc Components
JHC Haskell compiler split into reusable components
Stars: ✭ 55 (-51.75%)
Mutual labels:  optimizer
Booster
🚀Optimizer for mobile applications
Stars: ✭ 3,741 (+3181.58%)
Mutual labels:  optimizer
Glsl Optimizer
GLSL optimizer based on Mesa's GLSL compiler. Used to be used in Unity for mobile shader optimization.
Stars: ✭ 1,506 (+1221.05%)
Mutual labels:  optimizer
Jupiter
Jupiter是一款性能非常不错的, 轻量级的分布式服务框架
Stars: ✭ 1,372 (+1103.51%)
Mutual labels:  hessian
Avmf
🔩 Framework and Java implementation of the Alternating Variable Method
Stars: ✭ 16 (-85.96%)
Mutual labels:  optimizer
Marsnake
System Optimizer and Monitoring, Security Auditing, Vulnerability scanner for Linux, macOS, and UNIX-based systems
Stars: ✭ 16 (-85.96%)
Mutual labels:  optimizer
Xxl Rpc
A high performance, distributed RPC framework.(分布式服务框架XXL-RPC)
Stars: ✭ 493 (+332.46%)
Mutual labels:  hessian
Gpx Simplify Optimizer
Free Tracks Optimizer Online Service
Stars: ✭ 61 (-46.49%)
Mutual labels:  optimizer
Heimer
Heimer is a simple cross-platform mind map, diagram, and note-taking tool written in Qt.
Stars: ✭ 380 (+233.33%)
Mutual labels:  optimizer
Sofa Hessian
An internal improved version of Hessian powered by Ant Financial.
Stars: ✭ 105 (-7.89%)
Mutual labels:  hessian
Sam
SAM: Sharpness-Aware Minimization (PyTorch)
Stars: ✭ 322 (+182.46%)
Mutual labels:  optimizer
Vtil Core
Virtual-machine Translation Intermediate Language
Stars: ✭ 738 (+547.37%)
Mutual labels:  optimizer
Stacer
Linux System Optimizer and Monitoring - https://oguzhaninan.github.io/Stacer-Web
Stars: ✭ 7,405 (+6395.61%)
Mutual labels:  optimizer
Hawq
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
Stars: ✭ 108 (-5.26%)
Mutual labels:  hessian
Adamw keras
AdamW optimizer for Keras
Stars: ✭ 106 (-7.02%)
Mutual labels:  optimizer

Introduction

Block

AdaHessian is a second order based optimizer for the neural network training based on PyTorch. The library supports the training of convolutional neural networks (image_classification) and transformer-based models (transformer). Our TensorFlow implementation is adahessian_tf.

Please see this paper for more details on the AdaHessian algorithm.

For more details please see:

Performance on Rastrigin and Rosenbrock Fucntions:

Below is the convergence of AdaHessian on Rastrigin and Rosenbrock functions, and comparison with SGD and ADAM. Please see pytorch-optimizer repo for comparison with other optimizers.

Loss Function AdaHessian SGD ADAM

Usage

Please first clone the AdaHessian library to your local system:

git clone https://github.com/amirgholami/adahessian.git

You can import the optimizer as follows:

from optim_adahessian import Adahessian
...
model = YourModel()
optimizer = Adahessian(model.parameters())
...
for input, output in data:
  optimizer.zero_grad()
  loss = loss_function(output, model(input))
  loss.backward(create_graph=True)  # You need this line for Hessian backprop
  optimizer.step()
...

Please note that the optim_adahessian is in the image_classification folder. We also have adapted the Adahessian implementation to be compatible with fairseq repo, which can be used for NLP tasks. This is the link to that version, which can be found in transformer folder.

For different kernel size (e.g, matrix, Conv1D, Conv2D, etc)

We found out it would be helpful to add instruction about how to adopt AdaHessian for your own models and problems. Hence, we add a prototype version of AdaHessian as well as some useful comments in the instruction folder.

External implementations and discussions

We are thankful to all the researchers who have extended AdaHessian for different purposes or analyzed it. We include the following links in case you are interested to learn more about AdaHessian.

Description Link New Features
Reddit Discussion Link --
Fast.ai Discussion Link --
Best-Deep-Learning-Optimizers Code Link --
ada-hessian Code Link Support Delayed Hessian Update
JAX Code link --
AdaHessian Analysis Link Analyze AdaHessian on a 2D example

Citation

AdaHessian has been developed as part of the following paper. We appreciate it if you would please cite the following paper if you found the library useful for your work:

@article{yao2020adahessian,
  title={ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning},
  author={Yao, Zhewei and Gholami, Amir and Shen, Sheng and Keutzer, Kurt and Mahoney, Michael W},
  journal={AAAI (Accepted)},
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].