Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

Stars: ✭ 442 (+64.93%)

Mutual labels: jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, actor-critic

Rlenv.directory

Explore and find reinforcement learning environments in a list of 150+ open source environments.

Stars: ✭ 79 (-70.52%)

Mutual labels: gym, reinforcement-learning, deep-reinforcement-learning, rl

Lagom

lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.

Stars: ✭ 364 (+35.82%)

Mutual labels: jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, mujoco

Deeprl Tutorials

Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch

Stars: ✭ 748 (+179.1%)

Mutual labels: jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, actor-critic

Reinforcementlearning Atarigame

Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games

Stars: ✭ 118 (-55.97%)

Mutual labels: jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, actor-critic

Pytorch Rl

Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]

Stars: ✭ 121 (-54.85%)

Mutual labels: jupyter-notebook, reinforcement-learning, rl, actor-critic

Deepdrive

Deepdrive is a simulator that allows anyone with a PC to push the state-of-the-art in self-driving

Stars: ✭ 628 (+134.33%)

Mutual labels: gym, reinforcement-learning, deep-reinforcement-learning, control

Rl Book

Source codes for the book "Reinforcement Learning: Theory and Python Implementation"

Stars: ✭ 464 (+73.13%)

Mutual labels: gym, jupyter-notebook, reinforcement-learning, deep-reinforcement-learning

Rad

RAD: Reinforcement Learning with Augmented Data

Stars: ✭ 268 (+0%)

Mutual labels: jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, rl

Muzero General

MuZero

Stars: ✭ 1,187 (+342.91%)

Mutual labels: gym, reinforcement-learning, deep-reinforcement-learning, rl

Pytorch Drl

PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.

Stars: ✭ 233 (-13.06%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, rl, actor-critic

Pytorch A2c Ppo Acktr Gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Stars: ✭ 2,632 (+882.09%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic, mujoco

Mushroom Rl

Python library for Reinforcement Learning.

Stars: ✭ 442 (+64.93%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, rl, mujoco

Pytorch Rl

This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch

Stars: ✭ 394 (+47.01%)

Mutual labels: gym, reinforcement-learning, deep-reinforcement-learning, mujoco

Rl algos

Reinforcement Learning Algorithms

Stars: ✭ 14 (-94.78%)

Mutual labels: gym, reinforcement-learning, deep-reinforcement-learning, actor-critic

Gym Gazebo2

gym-gazebo2 is a toolkit for developing and comparing reinforcement learning algorithms using ROS 2 and Gazebo

Stars: ✭ 257 (-4.1%)

Mutual labels: gym, reinforcement-learning, deep-reinforcement-learning, rl

Trading Gym

A Trading environment base on Gym

Stars: ✭ 71 (-73.51%)

Mutual labels: gym, reinforcement-learning, rl

View All Similar Projects ➔

DrQ: Data regularized Q

This is a PyTorch implementation of DrQ from

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels by

Denis Yarats*, Ilya Kostrikov*, Rob Fergus.

*Equal contribution. Author ordering determined by coin flip.

[Paper] [Webpage]

Citation

If you use this repo in your research, please consider citing the paper as follows

@article{kostrikov2020image,
    title={Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels},
    author={Ilya Kostrikov and Denis Yarats and Rob Fergus},
    year={2020},
    eprint={2004.13649},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Requirements

We assume you have access to a gpu that can run CUDA 9.2. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate drq

Instructions

To train the DrQ agent on the Cartpole Swingup task run

python train.py env=cartpole_swingup

you can get the state-of-the-art performance in under 3 hours.

To reproduce the results from the paper run

python train.py env=cartpole_swingup batch_size=512 action_repeat=8

This will produce the runs folder, where all the outputs are going to be stored including train/eval logs, tensorboard blobs, and evaluation episode videos. To launch tensorboard run

tensorboard --logdir runs

The console output is also available in a form:

| train | E: 5 | S: 5000 | R: 11.4359 | D: 66.8 s | BR: 0.0581 | ALOSS: -1.0640 | CLOSS: 0.0996 | TLOSS: -23.1683 | TVAL: 0.0945 | AENT: 3.8132

a training entry decodes as

train - training episode
E - total number of episodes
S - total number of environment steps
R - episode return
D - duration in seconds
BR - average reward of a sampled batch
ALOSS - average loss of the actor
CLOSS - average loss of the critic
TLOSS - average loss of the temperature parameter
TVAL - the value of temperature
AENT - the actor's entropy

while an evaluation entry

| eval  | E: 20 | S: 20000 | R: 10.9356

contains

E - evaluation was performed after E episodes
S - evaluation was performed after S environment steps
R - average episode return computed over `num_eval_episodes` (usually 10)

The PlaNet Benchmark

DrQ demonstrates the state-of-the-art performance on a set of challenging image-based tasks from the DeepMind Control Suite (Tassa et al., 2018). We compare against PlaNet (Hafner et al., 2018), SAC-AE (Yarats et al., 2019), SLAC (Lee et al., 2019), CURL (Srinivas et al., 2020), and an upper-bound performance SAC States (Haarnoja et al., 2018). This follows the benchmark protocol established in PlaNet (Hafner et al., 2018).

The Dreamer Benchmark

DrQ demonstrates the state-of-the-art performance on an extended set of challenging image-based tasks from the DeepMind Control Suite (Tassa et al., 2018), following the benchmark protocol from Dreamer (Hafner et al., 2019). We compare against Dreamer (Hafner et al., 2019) and an upper-bound performance SAC States (Haarnoja et al., 2018).

Acknowledgements

We used kornia for data augmentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 268

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗