All Projects → denisyarats → Drq

denisyarats / Drq

Licence: mit
DrQ: Data regularized Q

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Drq

Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (-35.07%)
Mutual labels:  gym, jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, actor-critic, mujoco
Pytorch sac ae
PyTorch implementation of Soft Actor-Critic + Autoencoder(SAC+AE)
Stars: ✭ 94 (-64.93%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, actor-critic, mujoco
proto
Proto-RL: Reinforcement Learning with Prototypical Representations
Stars: ✭ 67 (-75%)
Mutual labels:  control, pixel, gym, rl, mujoco
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (+64.93%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, actor-critic
Rlenv.directory
Explore and find reinforcement learning environments in a list of 150+ open source environments.
Stars: ✭ 79 (-70.52%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, rl
Lagom
lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
Stars: ✭ 364 (+35.82%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, mujoco
Deeprl Tutorials
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
Stars: ✭ 748 (+179.1%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, actor-critic
Reinforcementlearning Atarigame
Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games
Stars: ✭ 118 (-55.97%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, actor-critic
Pytorch Rl
Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]
Stars: ✭ 121 (-54.85%)
Mutual labels:  jupyter-notebook, reinforcement-learning, rl, actor-critic
Deepdrive
Deepdrive is a simulator that allows anyone with a PC to push the state-of-the-art in self-driving
Stars: ✭ 628 (+134.33%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, control
Rl Book
Source codes for the book "Reinforcement Learning: Theory and Python Implementation"
Stars: ✭ 464 (+73.13%)
Mutual labels:  gym, jupyter-notebook, reinforcement-learning, deep-reinforcement-learning
Rad
RAD: Reinforcement Learning with Augmented Data
Stars: ✭ 268 (+0%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, rl
Muzero General
MuZero
Stars: ✭ 1,187 (+342.91%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, rl
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (-13.06%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, rl, actor-critic
Pytorch A2c Ppo Acktr Gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Stars: ✭ 2,632 (+882.09%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, mujoco
Mushroom Rl
Python library for Reinforcement Learning.
Stars: ✭ 442 (+64.93%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, rl, mujoco
Pytorch Rl
This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch
Stars: ✭ 394 (+47.01%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, mujoco
Rl algos
Reinforcement Learning Algorithms
Stars: ✭ 14 (-94.78%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, actor-critic
Gym Gazebo2
gym-gazebo2 is a toolkit for developing and comparing reinforcement learning algorithms using ROS 2 and Gazebo
Stars: ✭ 257 (-4.1%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, rl
Trading Gym
A Trading environment base on Gym
Stars: ✭ 71 (-73.51%)
Mutual labels:  gym, reinforcement-learning, rl

DrQ: Data regularized Q

This is a PyTorch implementation of DrQ from

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels by

Denis Yarats*, Ilya Kostrikov*, Rob Fergus.

*Equal contribution. Author ordering determined by coin flip.

[Paper] [Webpage]

Citation

If you use this repo in your research, please consider citing the paper as follows

@article{kostrikov2020image,
    title={Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels},
    author={Ilya Kostrikov and Denis Yarats and Rob Fergus},
    year={2020},
    eprint={2004.13649},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Requirements

We assume you have access to a gpu that can run CUDA 9.2. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate drq

Instructions

To train the DrQ agent on the Cartpole Swingup task run

python train.py env=cartpole_swingup

you can get the state-of-the-art performance in under 3 hours.

To reproduce the results from the paper run

python train.py env=cartpole_swingup batch_size=512 action_repeat=8

This will produce the runs folder, where all the outputs are going to be stored including train/eval logs, tensorboard blobs, and evaluation episode videos. To launch tensorboard run

tensorboard --logdir runs

The console output is also available in a form:

| train | E: 5 | S: 5000 | R: 11.4359 | D: 66.8 s | BR: 0.0581 | ALOSS: -1.0640 | CLOSS: 0.0996 | TLOSS: -23.1683 | TVAL: 0.0945 | AENT: 3.8132

a training entry decodes as

train - training episode
E - total number of episodes
S - total number of environment steps
R - episode return
D - duration in seconds
BR - average reward of a sampled batch
ALOSS - average loss of the actor
CLOSS - average loss of the critic
TLOSS - average loss of the temperature parameter
TVAL - the value of temperature
AENT - the actor's entropy

while an evaluation entry

| eval  | E: 20 | S: 20000 | R: 10.9356

contains

E - evaluation was performed after E episodes
S - evaluation was performed after S environment steps
R - average episode return computed over `num_eval_episodes` (usually 10)

The PlaNet Benchmark

DrQ demonstrates the state-of-the-art performance on a set of challenging image-based tasks from the DeepMind Control Suite (Tassa et al., 2018). We compare against PlaNet (Hafner et al., 2018), SAC-AE (Yarats et al., 2019), SLAC (Lee et al., 2019), CURL (Srinivas et al., 2020), and an upper-bound performance SAC States (Haarnoja et al., 2018). This follows the benchmark protocol established in PlaNet (Hafner et al., 2018). The PlaNet Benchmark

The Dreamer Benchmark

DrQ demonstrates the state-of-the-art performance on an extended set of challenging image-based tasks from the DeepMind Control Suite (Tassa et al., 2018), following the benchmark protocol from Dreamer (Hafner et al., 2019). We compare against Dreamer (Hafner et al., 2019) and an upper-bound performance SAC States (Haarnoja et al., 2018). The Dreamer Benchmark

Acknowledgements

We used kornia for data augmentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].