All Projects → denisyarats → Pytorch_sac_ae

denisyarats / Pytorch_sac_ae

Licence: mit
PyTorch implementation of Soft Actor-Critic + Autoencoder(SAC+AE)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pytorch sac ae

Drq
DrQ: Data regularized Q
Stars: ✭ 268 (+185.11%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, actor-critic, mujoco
Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (+85.11%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, actor-critic, mujoco
Pytorch A2c Ppo Acktr Gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Stars: ✭ 2,632 (+2700%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, mujoco
Rl algos
Reinforcement Learning Algorithms
Stars: ✭ 14 (-85.11%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, actor-critic
Pytorch Rl
This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch
Stars: ✭ 394 (+319.15%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, mujoco
Rl a3c pytorch
A3C LSTM Atari with Pytorch plus A3G design
Stars: ✭ 482 (+412.77%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Dissecting Reinforcement Learning
Python code, PDFs and resources for the series of posts on Reinforcement Learning which I published on my personal blog
Stars: ✭ 512 (+444.68%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Torch Ac
Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO
Stars: ✭ 70 (-25.53%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Muzero General
MuZero
Stars: ✭ 1,187 (+1162.77%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (+370.21%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Pytorch Rl
Deep Reinforcement Learning with pytorch & visdom
Stars: ✭ 745 (+692.55%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Deeprl Tutorials
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
Stars: ✭ 748 (+695.74%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Tensorflow Reinforce
Implementations of Reinforcement Learning Models in Tensorflow
Stars: ✭ 480 (+410.64%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Rl Book
Source codes for the book "Reinforcement Learning: Theory and Python Implementation"
Stars: ✭ 464 (+393.62%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Deepdrive
Deepdrive is a simulator that allows anyone with a PC to push the state-of-the-art in self-driving
Stars: ✭ 628 (+568.09%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Mushroom Rl
Python library for Reinforcement Learning.
Stars: ✭ 442 (+370.21%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, mujoco
Tensorflow Tutorial
TensorFlow and Deep Learning Tutorials
Stars: ✭ 748 (+695.74%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, autoencoder
Pytorch A3c
PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
Stars: ✭ 879 (+835.11%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Drlkit
A High Level Python Deep Reinforcement Learning library. Great for beginners, prototyping and quickly comparing algorithms
Stars: ✭ 29 (-69.15%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Lagom
lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
Stars: ✭ 364 (+287.23%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, mujoco

SAC+AE implementation in PyTorch

This is PyTorch implementation of SAC+AE from

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images by

Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, Rob Fergus.

[Paper] [Webpage]

Citation

If you use this repo in your research, please consider citing the paper as follows

@article{yarats2019improving,
    title={Improving Sample Efficiency in Model-Free Reinforcement Learning from Images},
    author={Denis Yarats and Amy Zhang and Ilya Kostrikov and Brandon Amos and Joelle Pineau and Rob Fergus},
    year={2019},
    eprint={1910.01741},
    archivePrefix={arXiv}
}

Requirements

We assume you have access to a gpu that can run CUDA 9.2. Then, the simplest way to install all required dependencies is to create an anaconda environment by running:

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with:

source activate pytorch_sac_ae

Instructions

To train an SAC+AE agent on the cheetah run task from image-based observations run:

python train.py \
    --domain_name cheetah \
    --task_name run \
    --encoder_type pixel \
    --decoder_type pixel \
    --action_repeat 4 \
    --save_video \
    --save_tb \
    --work_dir ./log \
    --seed 1

This will produce 'log' folder, where all the outputs are going to be stored including train/eval logs, tensorboard blobs, and evaluation episode videos. One can attacha tensorboard to monitor training by running:

tensorboard --logdir log

and opening up tensorboad in your browser.

The console output is also available in a form:

| train | E: 1 | S: 1000 | D: 0.8 s | R: 0.0000 | BR: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | RLOSS: 0.0000

a training entry decodes as:

train - training episode
E - total number of episodes 
S - total number of environment steps
D - duration in seconds to train 1 episode
R - episode reward
BR - average reward of sampled batch
ALOSS - average loss of actor
CLOSS - average loss of critic
RLOSS - average reconstruction loss (only if is trained from pixels and decoder)

while an evaluation entry:

| eval | S: 0 | ER: 21.1676

which just tells the expected reward ER evaluating current policy after S steps. Note that ER is average evaluation performance over num_eval_episodes episodes (usually 10).

Results

Our method demonstrates significantly improved performance over the baseline SAC:pixel. It matches the state-of-the-art performance of model-based algorithms, such as PlaNet (Hafner et al., 2018) and SLAC (Lee et al., 2019), as well as a model-free algorithm D4PG (Barth-Maron et al., 2018), that also learns from raw images. Our algorithm exhibits stable learning across ten random seeds and is extremely easy to implement. Results

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].