All Projects → andrewliao11 → Gail Tf

andrewliao11 / Gail Tf

Licence: mit
Tensorflow implementation of generative adversarial imitation learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gail Tf

Pytorch Rl
PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
Stars: ✭ 658 (+267.6%)
Mutual labels:  reinforcement-learning, generative-adversarial-network, trpo
Deterministic Gail Pytorch
PyTorch implementation of Deterministic Generative Adversarial Imitation Learning (GAIL) for Off Policy learning
Stars: ✭ 44 (-75.42%)
Mutual labels:  reinforcement-learning, generative-adversarial-network, imitation-learning
Hand dapg
Repository to accompany RSS 2018 paper on dexterous hand manipulation
Stars: ✭ 88 (-50.84%)
Mutual labels:  reinforcement-learning, imitation-learning
Torchrl
Pytorch Implementation of Reinforcement Learning Algorithms ( Soft Actor Critic(SAC)/ DDPG / TD3 /DQN / A2C/ PPO / TRPO)
Stars: ✭ 90 (-49.72%)
Mutual labels:  reinforcement-learning, trpo
Tensorflow Rl
Implementations of deep RL papers and random experimentation
Stars: ✭ 176 (-1.68%)
Mutual labels:  reinforcement-learning, trpo
Conversational Ai
Conversational AI Reading Materials
Stars: ✭ 34 (-81.01%)
Mutual labels:  reinforcement-learning, generative-adversarial-network
Pgdrive
PGDrive: an open-ended driving simulator with infinite scenes from procedural generation
Stars: ✭ 60 (-66.48%)
Mutual labels:  reinforcement-learning, imitation-learning
Chemgan Challenge
Code for the paper: Benhenda, M. 2017. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227.
Stars: ✭ 98 (-45.25%)
Mutual labels:  reinforcement-learning, generative-adversarial-network
Spiral Tensorflow
in progress
Stars: ✭ 117 (-34.64%)
Mutual labels:  reinforcement-learning, generative-adversarial-network
Ros2learn
ROS 2 enabled Machine Learning algorithms
Stars: ✭ 119 (-33.52%)
Mutual labels:  reinforcement-learning, trpo
Mlds2018spring
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
Stars: ✭ 124 (-30.73%)
Mutual labels:  reinforcement-learning, generative-adversarial-network
Udacity Deep Learning Nanodegree
This is just a collection of projects that made during my DEEPLEARNING NANODEGREE by UDACITY
Stars: ✭ 15 (-91.62%)
Mutual labels:  reinforcement-learning, generative-adversarial-network
Coach
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Stars: ✭ 2,085 (+1064.8%)
Mutual labels:  reinforcement-learning, imitation-learning
Run Skeleton Run
Reason8.ai PyTorch solution for NIPS RL 2017 challenge
Stars: ✭ 83 (-53.63%)
Mutual labels:  reinforcement-learning, trpo
Pytorch Rl
Deep Reinforcement Learning with pytorch & visdom
Stars: ✭ 745 (+316.2%)
Mutual labels:  reinforcement-learning, trpo
Ngsim env
Learning human driver models from NGSIM data with imitation learning.
Stars: ✭ 96 (-46.37%)
Mutual labels:  reinforcement-learning, imitation-learning
Exposure
Learning infinite-resolution image processing with GAN and RL from unpaired image datasets, using a differentiable photo editing model.
Stars: ✭ 605 (+237.99%)
Mutual labels:  reinforcement-learning, generative-adversarial-network
Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (+257.54%)
Mutual labels:  reinforcement-learning, trpo
Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+1578.21%)
Mutual labels:  reinforcement-learning, imitation-learning
Ravens
Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet. Transporter Nets, CoRL 2020.
Stars: ✭ 133 (-25.7%)
Mutual labels:  reinforcement-learning, imitation-learning

Check out the simpler version at openai/baselines/gail!

gail-tf

Tensorflow implementation of Generative Adversarial Imitation Learning (and behavior cloning)

disclaimers: some code is borrowed from @openai/baselines

What's GAIL?

  • model free imtation learning -> low sample efficiency in training time
    • model-based GAIL: End-to-End Differentiable Adversarial Imitation Learning
  • Directly extract policy from demonstrations
  • Remove the RL optimization from the inner loop od inverse RL
  • Some work based on GAIL:
    • Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs
    • Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
    • Robust Imitation of Diverse Behaviors

Requirements

  • python==3.5.2
  • mujoco-py==0.5.7
  • tensorflow==1.1.0
  • gym==0.9.3

Run the code

I separate the code into two parts: (1) Sampling expert data, (2) Imitation learning with GAIL/BC

Step 1: Generate expert data

Train the expert policy using PPO/TRPO, from openai/baselines

Ensure that $GAILTF is set to the path to your gail-tf repository, and $ENV_ID is any valid OpenAI gym environment (e.g. Hopper-v1, HalfCheetah-v1, etc.)

Configuration
export GAILTF=/path/to/your/gail-tf
export ENV_ID="Hopper-v1"
export BASELINES_PATH=$GAILTF/gailtf/baselines/ppo1 # use gailtf/baselines/trpo_mpi for TRPO
export SAMPLE_STOCHASTIC="False"            # use True for stochastic sampling
export STOCHASTIC_POLICY="False"            # use True for a stochastic policy
export PYTHONPATH=$GAILTF:$PYTHONPATH       # as mentioned below
cd $GAILTF
Train the expert
python3 $BASELINES_PATH/run_mujoco.py --env_id $ENV_ID

The trained model will save in ./checkpoint, and its precise name will vary based on your optimization method and environment ID. Choose the last checkpoint in the series.

export PATH_TO_CKPT=./checkpoint/trpo.Hopper.0.00/trpo.Hopper.00-900
Sample from the generated expert policy
python3 $BASELINES_PATH/run_mujoco.py --env_id $ENV_ID --task sample_trajectory --sample_stochastic $SAMPLE_STOCHASTIC --load_model_path $PATH_TO_CKPT

This will generate a pickle file that store the expert trajectories in ./XXX.pkl (e.g. deterministic.ppo.Hopper.0.00.pkl)

export PICKLE_PATH=./stochastic.trpo.Hopper.0.00.pkl

Step 2: Imitation learning

Imitation learning via GAIL

python3 main.py --env_id $ENV_ID --expert_path $PICKLE_PATH

Usage:

--env_id:          The environment id
--num_cpu:         Number of CPU available during sampling
--expert_path:     The path to the pickle file generated in the [previous section]()
--traj_limitation: Limitation of the exerpt trajectories
--g_step:          Number of policy optimization steps in each iteration
--d_step:          Number of discriminator optimization steps in each iteration
--num_timesteps:   Number of timesteps to train (limit the number of timesteps to interact with environment)

To view the summary plots in TensorBoard, issue

tensorboard --logdir $GAILTF/log
Evaluate your GAIL agent
python3 main.py --env_id $ENV_ID --task evaluate --stochastic_policy $STOCHASTIC_POLICY --load_model_path $PATH_TO_CKPT --expert_path $PICKLE_PATH

Imitation learning via Behavioral Cloning

python3 main.py --env_id $ENV_ID --algo bc --expert_path $PICKLE_PATH
Evaluate your BC agent
python3 main.py --env_id $ENV_ID --algo bc --task evalaute --stochastic_policy $STOCHASTIC_POLICY --load_model_path $PATH_TO_CKPT --expert_path $PICKLE_PATH

Results

Note: The following hyper-parameter setting is the best that I've tested (simple grid search on setting with 1500 trajectories), just for your information.

The different curves below correspond to different expert size (1000,100,10,5).

  • Hopper-v1 (Average total return of expert policy: 3589)
python3 main.py --env_id Hopper-v1 --expert_path baselines/ppo1/deterministic.ppo.Hopper.0.00.pkl --g_step 3 --adversary_entcoeff 0

  • Walker-v1 (Average total return of expert policy: 4392)
python3 main.py --env_id Walker2d-v1 --expert_path baselines/ppo1/deterministic.ppo.Walker2d.0.00.pkl --g_step 3 --adversary_entcoeff 1e-3

  • HalfCheetah-v1 (Average total return of expert policy: 2110)

For HalfCheetah-v1 and Ant-v1, using behavior cloning is needed:

python3 main.py --env_id HalfCheetah-v1 --expert_path baselines/ppo1/deterministic.ppo.HalfCheetah.0.00.pkl --pretrained True --BC_max_iter 10000 --g_step 3 --adversary_entcoeff 1e-3

You can find more details here, GAIL policy here, and BC policy here

Hacking

We don't have a pip package yet, so you'll need to add this repo to your PYTHONPATH manually.

export PYTHONPATH=/path/to/your/repo/with/gailtf:$PYTHONPATH

TODO

  • Create pip package/setup.py
  • Make style PEP8 compliant
  • Create requirements.txt
  • Depend on openai/baselines directly and modularize modifications
  • openai/robotschool support

TroubleShooting

  • encounter error: Cannot compile MPI programs. Check your configuration!!! or the systme complain about mpi/h
sudo apt install libopenmpi-dev

Reference

  • Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning, [arxiv]
  • @openai/imitation
  • @openai/baselines
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].