All Projects → LanqingLi1993 → FOCAL-ICLR

LanqingLi1993 / FOCAL-ICLR

Licence: MIT license
Code for FOCAL Paper Published at ICLR 2021

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to FOCAL-ICLR

tesp
Implementation of our paper "Meta Reinforcement Learning with Task Embedding and Shared Policy"
Stars: ✭ 28 (-20%)
Mutual labels:  meta-learning, meta-rl
maml-rl-tf2
Implementation of Model-Agnostic Meta-Learning (MAML) applied on Reinforcement Learning problems in TensorFlow 2.
Stars: ✭ 16 (-54.29%)
Mutual labels:  meta-learning, meta-rl
deep recommenders
Deep Recommenders
Stars: ✭ 214 (+511.43%)
Mutual labels:  multi-task-learning
NeuralMerger
Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence (IJCAI), 2018
Stars: ✭ 20 (-42.86%)
Mutual labels:  multi-task-learning
mtlearn
Multi-Task Learning package built with tensorflow 2 (Multi-Gate Mixture of Experts, Cross-Stitch, Ucertainty Weighting)
Stars: ✭ 45 (+28.57%)
Mutual labels:  multi-task-learning
HyperFace-TensorFlow-implementation
HyperFace
Stars: ✭ 68 (+94.29%)
Mutual labels:  multi-task-learning
LibFewShot
LibFewShot: A Comprehensive Library for Few-shot Learning.
Stars: ✭ 629 (+1697.14%)
Mutual labels:  meta-learning
MetaD2A
Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)
Stars: ✭ 49 (+40%)
Mutual labels:  meta-learning
Awesome-Few-shot
Awesome Few-shot learning
Stars: ✭ 50 (+42.86%)
Mutual labels:  meta-learning
dml
R package for Distance Metric Learning
Stars: ✭ 58 (+65.71%)
Mutual labels:  distance-metric-learning
lfda
Local Fisher Discriminant Analysis in R
Stars: ✭ 74 (+111.43%)
Mutual labels:  distance-metric-learning
emmental
A deep learning framework for building multimodal multi-task learning systems.
Stars: ✭ 93 (+165.71%)
Mutual labels:  multi-task-learning
tensorflow-maml
TensorFlow 2.0 implementation of MAML.
Stars: ✭ 79 (+125.71%)
Mutual labels:  meta-learning
CPG
Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen, "Compacting, Picking and Growing for Unforgetting Continual Learning," Thirty-third Conference on Neural Information Processing Systems, NeurIPS 2019
Stars: ✭ 91 (+160%)
Mutual labels:  multi-task-learning
amta-net
Asymmetric Multi-Task Attention Network for Prostate Bed Segmentation in CT Images
Stars: ✭ 26 (-25.71%)
Mutual labels:  multi-task-learning
Fine-Grained-or-Not
Code release for Your “Flamingo” is My “Bird”: Fine-Grained, or Not (CVPR 2021 Oral)
Stars: ✭ 32 (-8.57%)
Mutual labels:  multi-task-learning
Mt Dnn
Multi-Task Deep Neural Networks for Natural Language Understanding
Stars: ✭ 1,871 (+5245.71%)
Mutual labels:  multi-task-learning
agegenderLMTCNN
Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, and Chu-Song Chen, "Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications," IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2018
Stars: ✭ 39 (+11.43%)
Mutual labels:  multi-task-learning
Learning-To-Compare-For-Text
Learning To Compare For Text , Few shot learning in text classification
Stars: ✭ 38 (+8.57%)
Mutual labels:  meta-learning
awesome-few-shot-meta-learning
awesome few shot / meta learning papers
Stars: ✭ 44 (+25.71%)
Mutual labels:  meta-learning

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning Via Distance Metric Learning and Behavior Regularization

We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks without any interactions with the environments, making RL truly practical in many real-world applications. This problem is still not fully understood, for which two major challenges need to be addressed. First, offline RL usually suffers from bootstrapping errors of out-of-distribution state-actions which leads to divergence of value functions. Second, meta-RL requires efficient and robust task inference learned jointly with control policy. In this work, we enforce behavior regularization on learned policy as a general approach to offline RL, combined with a deterministic context encoder for efficient task inference. We propose a novel negative-power distance metric on bounded context embedding space, whose gradients propagation is detached from the Bellman backup. We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches involving meta-RL and distance metric learning. To the best of our knowledge, our method is the first model-free and end-to-end OMRL algorithm, which is computationally efficient and demonstrated to outperform prior algorithms on several meta-RL benchmarks.

Installation

To install locally, you will need to first install MuJoCo. For task distributions in which the reward function varies (Cheetah, Ant), install MuJoCo150 or plus. Set LD_LIBRARY_PATH to point to both the MuJoCo binaries (/$HOME/.mujoco/mujoco200/bin) as well as the gpu drivers (something like /usr/lib/nvidia-390, you can find your version by running nvidia-smi).

For the remaining dependencies, create conda environment by

conda env create -f environment.yaml

For Walker environments, MuJoCo131 is required. Simply install it the same way as MuJoCo200. To swtch between different MuJoCo versions:

export MUJOCO_PY_MJPRO_PATH=~/.mujoco/mjpro${VERSION_NUM}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mjpro${VERSION_NUM}/bin

The environments make use of the module rand_param_envs which is submoduled in this repository. Add the module to your python path, export PYTHONPATH=./rand_param_envs:$PYTHONPATH (Check out direnv for handy directory-dependent path managenement.)

This installation has been tested only on 64-bit CentOS 7.2. The whole pipeline consists of two stages: data generation and Offline RL experiments:

Data Generation

FOCAL requires fixed data (batch) for meta-training and meta-testing, which are generated by trained SAC behavior policies. Experiments at this stage are configured via train.yaml located in ./rlkit/torch/sac/pytorch_sac/config/.

Example of training policies and generating trajectories on multiple tasks:

python policy_train.py --gpu 0

Generate trajectories from pretrained models

python policy_train.py --eval

Generated data will be saved in ./data/

Offline RL Experiments

Experiments are configured via json configuration files located in ./configs. Basic settings are defined and described in ./configs/default.py. To reproduce an experiment, run:

python launch_experiment.py ./configs/[EXP].json

By default the code will use the GPU - to use CPU instead, set use_gpu=False in the corresponding config file.

Output files will be written to ./output/[ENV]/[EXP NAME] where the experiment name corresponds to the process starting time. The file progress.csv contains statistics logged over the course of training. data_epoch_[EPOCH].csv contains embedding vector statistics. We recommend viskit for visualizing learning curves: https://github.com/vitchyr/viskit. Network weights are also snapshotted during training.

To evaluate a learned policy after training has concluded, run sim_policy.py. This script will run a given policy across a set of evaluation tasks and optionally generate a video of these trajectories. Rendering is offline and the video is saved to the experiment folder.

Example of running experiment on walker_rand_params environment:

  • download walker data and unzip the data to ./data/walker_rand_params (Download all normalized offline training data used in the FOCAL paper here)
  • edit walker_rand_params.json to add dump_eval_paths=1 and data_dir=./data/walker_rand_params
  • run python launch_experiment.py ./configs/walker_rand_params.json

Reproducing Result in FOCAL Paper

We provide code for reproducing figure 2-9 and table 1 in generate_plot.py. Use output data to download the output files required for visualization and add them to ./output/ directory. To produce all figures at a time, run

python3 generate_plot.py

To produce each figure individually, run the function named by the corresponding figure number in main().

References

@inproceedings{li2021focal,
  title={{FOCAL}: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization},
  author={Lanqing Li and Rui Yang and Dijun Luo},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=8cpHIfgY4Dj}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].