All Projects → Janus-Shiau → lookahead_tensorflow

Janus-Shiau / lookahead_tensorflow

Licence: other
Lookahead optimizer ("Lookahead Optimizer: k steps forward, 1 step back") for tensorflow

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to lookahead tensorflow

Radam
On the Variance of the Adaptive Learning Rate and Beyond
Stars: ✭ 2,442 (+9668%)
Mutual labels:  optimizer, adam-optimizer
CS231n
PyTorch/Tensorflow solutions for Stanford's CS231n: "CNNs for Visual Recognition"
Stars: ✭ 47 (+88%)
Mutual labels:  adam-optimizer, sgd-optimizer
artificial-neural-variability-for-deep-learning
The PyTorch Implementation of Variable Optimizers/ Neural Variable Risk Minimization proposed in our Neural Computation paper: Artificial Neural Variability for Deep Learning: On overfitting, Noise Memorization, and Catastrophic Forgetting.
Stars: ✭ 34 (+36%)
Mutual labels:  optimizer
adamwr
Implements https://arxiv.org/abs/1711.05101 AdamW optimizer, cosine learning rate scheduler and "Cyclical Learning Rates for Training Neural Networks" https://arxiv.org/abs/1506.01186 for PyTorch framework
Stars: ✭ 130 (+420%)
Mutual labels:  optimizer
Hypergradient variants
Improved Hypergradient optimizers, providing better generalization and faster convergence.
Stars: ✭ 15 (-40%)
Mutual labels:  adam-optimizer
XTR-Toolbox
🛠 Versatile tool to optimize Windows
Stars: ✭ 138 (+452%)
Mutual labels:  optimizer
Cleaner
The only storage saving app that actually works! :D
Stars: ✭ 27 (+8%)
Mutual labels:  optimizer
neth-proxy
Stratum <-> Stratum Proxy and optimizer for ethminer
Stars: ✭ 35 (+40%)
Mutual labels:  optimizer
ToyDB
A ToyDB (for beginner) based on MIT 6.830 and CMU 15445
Stars: ✭ 25 (+0%)
Mutual labels:  optimizer
keras-gradient-accumulation
Gradient accumulation for Keras
Stars: ✭ 35 (+40%)
Mutual labels:  optimizer
portfolio-optimizer
A library for portfolio optimization algorithms with python interface.
Stars: ✭ 19 (-24%)
Mutual labels:  optimizer
Optimizers-for-Tensorflow
Adam, NAdam and AAdam optimizers
Stars: ✭ 20 (-20%)
Mutual labels:  optimizer
prediction gan
PyTorch Impl. of Prediction Optimizer (to stabilize GAN training)
Stars: ✭ 31 (+24%)
Mutual labels:  optimizer
ML-MCU
Code for IoT Journal paper title 'ML-MCU: A Framework to Train ML Classifiers on MCU-based IoT Edge Devices'
Stars: ✭ 28 (+12%)
Mutual labels:  sgd-optimizer
horoscope
horoscope is an optimizer inspector for DBMS.
Stars: ✭ 34 (+36%)
Mutual labels:  optimizer
Post-Tweaks
A post-installation batch script for Windows
Stars: ✭ 136 (+444%)
Mutual labels:  optimizer
AshBF
Over-engineered Brainfuck optimizing compiler and interpreter
Stars: ✭ 14 (-44%)
Mutual labels:  optimizer
LAMB Optimizer TF
LAMB Optimizer for Large Batch Training (TensorFlow version)
Stars: ✭ 119 (+376%)
Mutual labels:  optimizer
soar-php
SQL optimizer and rewriter. - SQL 优化、重写器(辅助 SQL 调优)。
Stars: ✭ 140 (+460%)
Mutual labels:  optimizer
AdaBound-tensorflow
An optimizer that trains as fast as Adam and as good as SGD in Tensorflow
Stars: ✭ 44 (+76%)
Mutual labels:  optimizer

lookahead_tensorflow

Lookahead optimizer ("Lookahead Optimizer: k steps forward, 1 step back") for tensorflow

Environment

This code is implemmented and tested with tensorflow 1.11.0. and 1.13.0.
I didn't use any special operator, so it should also work for other version of tensorflow.

Usage

I didn't directly wrap the optimizer, but make the lookahead strategy independent.
Thus, it's more flexible to decide what should be optimized with lookahead.

  1. Please assert the class after all variable initialization, and initialize the BaseLoookAhead with all trainable variables.
import tensorflow as tf
from lookahead_opt import BaseLookAhead

"""
Build your model here
Please also include any optimizer you need.
"""

model_vars = [v for v in tf.trainable_variables()]
tf.global_variables_initializer().run()

lookahead = BaseLookAhead(model_vars, k=5, alpha=0.5)

Arguments are define as follows:

model_vars: the variables to be lookahead. [list]
k: the number of steps that fast weights go forward. [int]
alpha: The learning rate for merging slow to fast weight. [float]

  1. Add the assign operator to training operation or directly run in session.
# Add to train_op
train_op += lookahead.get_ops()

# Or just run the Session
with tf.Session() as sess:
  _ = sess.run(lookahead.get_ops())

Implementation Details

Inject Lookahead to model and save specific variables

The Lookahead is wrapped with default variable_scope "lookahead". After calling BaseLookAhead with specific variables, the variables will be injected to lookahead.
Noted that, the lookahead class is totally separated from optimizer, please remember to add optimizer when creating training graph.

Example template graph with lookahead

The BaseLookAhead will create duplicated tf.Variables to save the slow weight. And a counter will be automatically created to do "k steps forward, 1 step back".

Example template graph with lookahead

Experimental results

I have conduct experiments on a many-to-many recursive task with stacked weight-dropped LSTM, proposed in "Regularizing and Optimizing LSTM Language Models".
Using lookahead with Adam, the training loss is higher than the model without lookahead. But the validation loss with lookahead is slightly better.

Contact & Copy Right

Code work by Jia-Yau Shiau [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].