All Projects → nicklashansen → neural-net-optimization

nicklashansen / neural-net-optimization

Licence: MIT license
PyTorch implementations of recent optimization algorithms for deep learning.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to neural-net-optimization

Sporco
Sparse Optimisation Research Code
Stars: ✭ 164 (+177.97%)
Mutual labels:  optimization-algorithms
pallas-solver
Global optimization algorithms written in C++
Stars: ✭ 43 (-27.12%)
Mutual labels:  optimization-algorithms
Harris-Hawks-Optimization-Algorithm-and-Applications
Source codes for HHO paper: Harris hawks optimization: Algorithm and applications: https://www.sciencedirect.com/science/article/pii/S0167739X18313530. In this paper, a novel population-based, nature-inspired optimization paradigm is proposed, which is called Harris Hawks Optimizer (HHO).
Stars: ✭ 31 (-47.46%)
Mutual labels:  optimization-algorithms
Python Mip
Collection of Python tools for the modeling and solution of Mixed-Integer Linear programs
Stars: ✭ 202 (+242.37%)
Mutual labels:  optimization-algorithms
Argmin
Mathematical optimization in pure Rust
Stars: ✭ 234 (+296.61%)
Mutual labels:  optimization-algorithms
sdaopt
Simulated Dual Annealing for python and benchmarks
Stars: ✭ 15 (-74.58%)
Mutual labels:  optimization-algorithms
Nmflibrary
MATLAB library for non-negative matrix factorization (NMF): Version 1.8.1
Stars: ✭ 153 (+159.32%)
Mutual labels:  optimization-algorithms
AuxiLearn
Official implementation of Auxiliary Learning by Implicit Differentiation [ICLR 2021]
Stars: ✭ 71 (+20.34%)
Mutual labels:  optimization-algorithms
Aleph star
Reinforcement learning with A* and a deep heuristic
Stars: ✭ 235 (+298.31%)
Mutual labels:  optimization-algorithms
psopy
A SciPy compatible super fast Python implementation for Particle Swarm Optimization.
Stars: ✭ 33 (-44.07%)
Mutual labels:  optimization-algorithms
Relion
Image-processing software for cryo-electron microscopy
Stars: ✭ 219 (+271.19%)
Mutual labels:  optimization-algorithms
Abagail
The library contains a number of interconnected Java packages that implement machine learning and artificial intelligence algorithms. These are artificial intelligence algorithms implemented for the kind of people that like to implement algorithms themselves.
Stars: ✭ 225 (+281.36%)
Mutual labels:  optimization-algorithms
cspy
A collection of algorithms for the (Resource) Constrained Shortest Path problem in Python / C++ / C#
Stars: ✭ 64 (+8.47%)
Mutual labels:  optimization-algorithms
Optimizer Visualization
Visualize Tensorflow's optimizers.
Stars: ✭ 178 (+201.69%)
Mutual labels:  optimization-algorithms
MIRT.jl
MIRT: Michigan Image Reconstruction Toolbox (Julia version)
Stars: ✭ 80 (+35.59%)
Mutual labels:  optimization-algorithms
Bads
Bayesian Adaptive Direct Search (BADS) optimization algorithm for model fitting in MATLAB
Stars: ✭ 159 (+169.49%)
Mutual labels:  optimization-algorithms
pybnb
A parallel branch-and-bound engine for Python. (https://pybnb.readthedocs.io/)
Stars: ✭ 53 (-10.17%)
Mutual labels:  optimization-algorithms
Nature-Inspired-Algorithms
Sample Code Collection of Nature-Inspired Computational Methods
Stars: ✭ 22 (-62.71%)
Mutual labels:  optimization-algorithms
optaplanner-quickstarts
OptaPlanner quick starts for AI optimization: many use cases shown in many different technologies.
Stars: ✭ 226 (+283.05%)
Mutual labels:  optimization-algorithms
paradiseo
An evolutionary computation framework to (automatically) build fast parallel stochastic optimization solvers
Stars: ✭ 73 (+23.73%)
Mutual labels:  optimization-algorithms

Optimization for Deep Learning

This repository contains PyTorch implementations of popular/recent optimization algorithms for deep learning, including SGD, SGD w/ momentum, SGD w/ Nesterov momentum, SGDW, RMSprop, Adam, Nadam, Adam w/ L2 regularization, AdamW, RAdam, RAdamW, Gradient Noise, Gradient Dropout, Learning Rate Dropout and Lookahead.

All extensions have been implemented such that it allows for mix-and-match optimization, e.g. you can train a neural net using RAdamW with both Nesterov momentum, Gradient Noise, Learning Rate Dropout and Lookahead.


Related papers

Material in this repository has been developed as part of a special course / study and reading group. This is the list of papers that we have discussed and/or implemented:

An Overview of Gradient Descent Optimization Algorithms

Optimization Methods for Large-Scale Machine Learning

On the importance of initialization and momentum in deep learning

Aggregated Momentum: Stability Through Passive Damping

ADADELTA: An Adaptive Learning Rate Method

RMSprop

Adam: A Method for Stochastic Optimization

On the Convergence of Adam and Beyond

Decoupled Weight Decay Regularization

On the Variance of the Adaptive Learning Rate and Beyond

Incorporating Nesterov Momentum Into Adam

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

On the Convergence of AdaBound and its Connection to SGD

Lookahead Optimizer: k steps forward, 1 step back

The Marginal Value of Adaptive Gradient Methods in Machine Learning

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

Curriculum Learning in Deep Neural Networks

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

Adding Gradient Noise Improves Learning for Very Deep Networks

Learning Rate Dropout

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks


How to run

You can run the experiments and algorithms by calling e.g.

python main.py -num_epochs 30 -dataset cifar -num_train 50000 -num_val 2048 -lr_schedule True

with arguments as specified in the main.py file. The algorithms can be run on two different datasets, MNIST and CIFAR-10. For MNIST a small MLP is used for proof of concept, whereas a 808,458 parameter CNN is used for CIFAR-10. You may optionally decrease the size of the dataset and/or number of epochs to decrease computational complexity, but the arguments given above were used to produce the results shown here.


Results

Below you will find our main results. As for all optimization problems, the performance of particular algorithms is highly dependent on the problem details as well as hyper-parameters. While we have made no attempt at fine-tuning the hyper-parameters of individual optimization methods, we have kept as many hyper-parameters as possible constant to better allow for comparison. Wherever possible, default hyper-parameters as proposed by original authors have been used.

When faced with a real application, one should always try out a number of different algorithms and hyper-parameters to figure out what works better for your particular problem.

cifar_sgd

cifar_rmsprop_adam

cifar_adam_weight_decay

cifar_adam

cifar_lrd

cifar_gradnoise

cifar_lookahead

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].