All Projects → JunjieYang97 → stocBiO

JunjieYang97 / stocBiO

Licence: MIT License
Example code for paper "Bilevel Optimization: Nonasymptotic Analysis and Faster Algorithms"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to stocBiO

ProxGradPytorch
PyTorch implementation of Proximal Gradient Algorithms a la Parikh and Boyd (2014). Useful for Auto-Sizing (Murray and Chiang 2015, Murray et al. 2019).
Stars: ✭ 28 (+27.27%)
Mutual labels:  hyperparameter
zookeeper
A small library for managing deep learning models, hyperparameters and datasets
Stars: ✭ 22 (+0%)
Mutual labels:  hyperparameter
theseus
A library for differentiable nonlinear optimization
Stars: ✭ 1,257 (+5613.64%)
Mutual labels:  bilevel-optimization
Scikit Optimize
Sequential model-based optimization with a `scipy.optimize` interface
Stars: ✭ 2,258 (+10163.64%)
Mutual labels:  hyperparameter

Efficient bilevel Optimizers stocBiO, ITD-BiO and FO-ITD-BiO.

Codes for ICML 2021 paper Bilevel Optimization: Nonasymptotic Analysis and Faster Algorithms by Kaiyi Ji, Junjie Yang, and Yingbin Liang from The Ohio State University.

stocBiO for hyperparameter optimization

Our hyperparameter optimization implementation is bulit on HyperTorch, where we propose stoc-BiO algorithm with better performance than other bilevel algorithms. Our code is tested on python3 and PyTorch1.8.

Note that hypergrad package is built on HyperTorch.

The experiments based on 20 Newsgroup and MNIST datasets are in l2reg_on_twentynews.py and mnist_exp.py, respectively.

How to run our code

We introduce some basic args meanning as follows.

Args meaning

  • --alg: Different algorithms we support.
  • --hessian_q: The number of Hessian vectors used to estimate.
  • --training_size: The number of samples used in training.
  • --validation_size: The number of samples used for validation.
  • --batch_size: Batch size for traning data.
  • --epochs: Outer epoch number for training.
  • --iterations or --T: Inner iteration number for training.
  • --eta: Hyperparameter $\eta$ for Hessian inverse approximation.
  • --noise_rate: The corruption rate for MNIST data.

To replicate empirical results under different datasets in our paper, please run the following commands:

stocBiO in MNIST with p=0.1

python3 mnist_exp.py --alg stocBiO --batch_size 50 --noise_rate 0.1

stocBiO in MNIST with p=0.4

python3 mnist_exp.py --alg stocBiO --batch_size 50 --noise_rate 0.4

stocBiO in 20 Newsgroup

python3 l2reg_on_twentynews.py --alg stocBiO

AID-FP in MNIST with p=0.4

python3 mnist_exp.py --alg AID-FP --batch_size 50 --noise_rate 0.4

AID-FP in 20 Newsgroup

python3 l2reg_on_twentynews.py --alg AID-FP

ITD-BiO and FO-ITD-BiO for meta-learning

Our meta-learning part is built on learn2learn, where we implement the bilevel optimizer ITD-BiO and show that it converges faster than MAML and ANIL. Note that we also implement first-order ITD-BiO (FO-ITD-BiO) without computing the derivative of the inner-loop output with respect to feature parameters, i.e., removing all Jacobian and Hessian-vector calculations. It turns out that FO-ITD-BiO is even faster without sacrificing overall prediction accuracy.

Environments for meta-learning experiments

For Windows OS,

  • PyTorch=1.7.1
  • l2l=0.1.5
  • python=3.8
  • cuda=11.3

For Linux OS,

  • PyTorch=1.7.0
  • l2l=0.1.5
  • python=3.6.9
  • cuda=10.2

For both OS, we highly suggest an old version of l2l. For latest versions of l2l, some adaptations of codes are needed.

Some experiment examples

In the following, we provide some experiments to demonstrate the better performance of the proposed stoc-BiO algorithm.

We compare our algorithm to various hyperparameter baseline algorithms on 20 Newsgroup dataset:

We evaluate the performance of our algorithm with respect to different batch sizes:

The comparison results on MNIST dataset:

This repo is still under construction and any comment is welcome!

Citation

If this repo is useful for your research, please cite our paper:

@inproceedings{ji2021bilevel,
	author = {Ji, Kaiyi and Yang, Junjie and Liang, Yingbin},
	title = {Bilevel Optimization: Nonasymptotic Analysis and Faster Algorithms},
	booktitle={International Conference on Machine Learning (ICML)},
	year = {2021}}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].