All Projects → nestordemeure → flaxOptimizers

nestordemeure / flaxOptimizers

Licence: Apache-2.0 License
A collection of optimizers, some arcane others well known, for Flax.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to flaxOptimizers

jax-resnet
Implementations and checkpoints for ResNet, Wide ResNet, ResNeXt, ResNet-D, and ResNeSt in JAX (Flax).
Stars: ✭ 61 (+190.48%)
Mutual labels:  flax
axon
Nx-powered Neural Networks
Stars: ✭ 1,170 (+5471.43%)
Mutual labels:  optimizers
koclip
KoCLIP: Korean port of OpenAI CLIP, in Flax
Stars: ✭ 80 (+280.95%)
Mutual labels:  flax
Flaxengine
Flax Engine – multi-platform 3D game engine
Stars: ✭ 3,127 (+14790.48%)
Mutual labels:  flax
jax-rl
JAX implementations of core Deep RL algorithms
Stars: ✭ 61 (+190.48%)
Mutual labels:  flax
Hypergradient variants
Improved Hypergradient optimizers, providing better generalization and faster convergence.
Stars: ✭ 15 (-28.57%)
Mutual labels:  optimizers
Nn
🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+27138.1%)
Mutual labels:  optimizers
get-started-with-JAX
The purpose of this repo is to make it easy to get started with JAX, Flax, and Haiku. It contains my "Machine Learning with JAX" series of tutorials (YouTube videos and Jupyter Notebooks) as well as the content I found useful while learning about the JAX ecosystem.
Stars: ✭ 229 (+990.48%)
Mutual labels:  flax
jax-models
Unofficial JAX implementations of deep learning research papers
Stars: ✭ 108 (+414.29%)
Mutual labels:  flax
chef-transformer
Chef Transformer 🍲 .
Stars: ✭ 29 (+38.1%)
Mutual labels:  flax
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+265338.1%)
Mutual labels:  flax
keras-lookahead
Lookahead mechanism for optimizers in Keras.
Stars: ✭ 50 (+138.1%)
Mutual labels:  optimizers
efficientnet-jax
EfficientNet, MobileNetV3, MobileNetV2, MixNet, etc in JAX w/ Flax Linen and Objax
Stars: ✭ 114 (+442.86%)
Mutual labels:  flax
Pyprobml
Python code for "Machine learning: a probabilistic perspective" (2nd edition)
Stars: ✭ 4,197 (+19885.71%)
Mutual labels:  flax
transformer
Neutron: A pytorch based implementation of Transformer and its variants.
Stars: ✭ 60 (+185.71%)
Mutual labels:  optimizers
score flow
Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)
Stars: ✭ 49 (+133.33%)
Mutual labels:  flax
uvadlc notebooks
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2022/Spring 2022
Stars: ✭ 901 (+4190.48%)
Mutual labels:  flax
FlaxSamples
Collection of example projects for Flax Engine
Stars: ✭ 50 (+138.1%)
Mutual labels:  flax
ML-Optimizers-JAX
Toy implementations of some popular ML optimizers using Python/JAX
Stars: ✭ 37 (+76.19%)
Mutual labels:  optimizers
submodlib
Summarize Massive Datasets using Submodular Optimization
Stars: ✭ 36 (+71.43%)
Mutual labels:  optimizers

Flax Optimizers

A collection of optimizers for Flax. The repository is open to pull requests.

Installation

You can install this librarie with:

pip install git+https://github.com/nestordemeure/flaxOptimizers.git

Optimizers

Classical optimizers, inherited from the official Flax implementation:

  • Adafactor A memory efficient optimizer, has been used for large-scale training of attention-based models.
  • Adagrad Introduces a denominator to SGD so that each parameter has its own learning rate.
  • Adam The most common stochastic optimizer nowadays.
  • LAMB Improvement on LARS to makes it efficient across task types.
  • LARS An optimizer designed for large batch.
  • Momentum SGD with momentum, optionally Nesterov momentum.
  • RMSProp Developped to solve Adagrad's diminushing learning rate problem.
  • SGD The simplest stochastic gradient descent optimizer possible.

More arcane first-order optimizers:

  • AdamHD Uses hypergradient descent to tune its own learning rate. Good at the begining of the training but tends to underperform at the end.
  • AdamP Corrects premature step-size decay for scale-invariant weights. Useful when a model uses some form of Batch normalization.
  • LapProp Applies exponential smoothing to update rather than gradient.
  • MADGRAD Modernisation of the Adagrad family of optimizers, very competitive with Adam.
  • RAdam Uses a rectified variance estimation to compute the learning rate. Makes training smoother, especially in the first iterations.
  • RAdamSimplified Warmup strategy proposed to reproduce RAdam's result with a much decreased code complexity.
  • Ranger Combines look-ahead, RAdam and gradient centralization to try and maximize performances. Designed with picture classification problems in mind.
  • Ranger21 An upgrade of Ranger that combines adaptive gradient clipping, gradient centralization, positive-negative momentum, norm loss, stable weight-decay, linear learning rate warm up, explore exploite scheduling, lookahead and Adam. It has been designed with transformers in mind.
  • Sadam Introduces an alternative to the epsilon parameter.

Optimizer wrappers:

  • WeightNorm Alternative to BatchNormalization, does the weight normalization inside the optimizer which makes it compatible with more models and faster (official Flax implementation)

Other references

  • AdahessianJax contains my implementation of the Adahessian second order optimizer in Flax.
  • Flax.optim contains a number of optimizer that currently do not appear in the official documentation. They are all included accesible from this librarie.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].