All Projects → germain-hug → Deep Rl Keras

germain-hug / Deep Rl Keras

Keras Implementation of popular Deep RL Algorithms (A3C, DDQN, DDPG, Dueling DDQN)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Deep Rl Keras

Pytorch Rl
This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch
Stars: ✭ 394 (-0.25%)
Mutual labels:  gym, reinforcement-learning, dqn, policy-gradient, ddpg
Reinforcement Learning With Tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
Stars: ✭ 6,948 (+1658.99%)
Mutual labels:  reinforcement-learning, dqn, policy-gradient, a3c, ddpg
Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+660.51%)
Mutual labels:  reinforcement-learning, dqn, policy-gradient, a3c, ddpg
Torchrl
Pytorch Implementation of Reinforcement Learning Algorithms ( Soft Actor Critic(SAC)/ DDPG / TD3 /DQN / A2C/ PPO / TRPO)
Stars: ✭ 90 (-77.22%)
Mutual labels:  gym, reinforcement-learning, dqn, ddpg
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (-43.8%)
Mutual labels:  dqn, policy-gradient, a3c, ddpg
Slm Lab
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Stars: ✭ 904 (+128.86%)
Mutual labels:  reinforcement-learning, dqn, policy-gradient, a3c
Reinforcement learning
Reinforcement learning tutorials
Stars: ✭ 82 (-79.24%)
Mutual labels:  reinforcement-learning, dqn, policy-gradient, a3c
Deeprl Tensorflow2
🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2
Stars: ✭ 319 (-19.24%)
Mutual labels:  reinforcement-learning, dqn, a3c, ddpg
Machin
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
Stars: ✭ 145 (-63.29%)
Mutual labels:  reinforcement-learning, dqn, a3c, ddpg
Minimalrl
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
Stars: ✭ 2,051 (+419.24%)
Mutual labels:  reinforcement-learning, dqn, a3c, ddpg
Rlcycle
A library for ready-made reinforcement learning agents and reusable components for neat prototyping
Stars: ✭ 184 (-53.42%)
Mutual labels:  reinforcement-learning, dqn, a3c, ddpg
Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+624.81%)
Mutual labels:  reinforcement-learning, dqn, policy-gradient, a3c
Rl algorithms
Structural implementation of RL key algorithms
Stars: ✭ 352 (-10.89%)
Mutual labels:  gym, reinforcement-learning, dqn, policy-gradient
Atari
AI research environment for the Atari 2600 games 🤖.
Stars: ✭ 174 (-55.95%)
Mutual labels:  gym, reinforcement-learning, dqn
A2c
A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow
Stars: ✭ 169 (-57.22%)
Mutual labels:  gym, reinforcement-learning, policy-gradient
Deep Reinforcement Learning
Repo for the Deep Reinforcement Learning Nanodegree program
Stars: ✭ 4,012 (+915.7%)
Mutual labels:  reinforcement-learning, dqn, ddpg
reinforcement learning with Tensorflow
Minimal implementations of reinforcement learning algorithms by Tensorflow
Stars: ✭ 28 (-92.91%)
Mutual labels:  dqn, a3c, ddpg
Openai lab
An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
Stars: ✭ 313 (-20.76%)
Mutual labels:  reinforcement-learning, policy-gradient, ddpg
Super Mario Bros A3c Pytorch
Asynchronous Advantage Actor-Critic (A3C) algorithm for Super Mario Bros
Stars: ✭ 775 (+96.2%)
Mutual labels:  gym, reinforcement-learning, a3c
deep rl acrobot
TensorFlow A2C to solve Acrobot, with synchronized parallel environments
Stars: ✭ 32 (-91.9%)
Mutual labels:  policy-gradient, a3c, ddpg

Deep Reinforcement Learning in Keras

Modular Implementation of popular Deep Reinforcement Learning algorithms in Keras:

  • [x] Synchronous N-step Advantage Actor Critic (A2C)
  • [x] Asynchronous N-step Advantage Actor-Critic (A3C)
  • [x] Deep Deterministic Policy Gradient with Parameter Noise (DDPG)
  • [x] Double Deep Q-Network (DDQN)
  • [x] Double Deep Q-Network with Prioritized Experience Replay (DDQN + PER)
  • [x] Dueling DDQN (D3QN)

Getting Started

This implementation requires keras 2.1.6, as well as OpenAI gym.

$ pip install gym keras==2.1.6

Actor-Critic Algorithms

N-step Advantage Actor Critic (A2C)

The Actor-Critic algorithm is a model-free, off-policy method where the critic acts as a value-function approximator, and the actor as a policy-function approximator. When training, the critic predicts the TD-Error and guides the learning of both itself and the actor. In practice, we approximate the TD-Error using the Advantage function. For more stability, we use a shared computational backbone across both networks, as well as an N-step formulation of the discounted rewards. We also incorporate an entropy regularization term ("soft" learning) to encourage exploration. While A2C is simple and efficient, running it on Atari Games quickly becomes intractable due to long computation time.

N-step Asynchronous Advantage Actor Critic (A3C)

In a similar fashion as the A2C algorithm, the implementation of A3C incorporates asynchronous weight updates, allowing for much faster computation. We use multiple agents to perform gradient ascent asynchronously, over multiple threads. We test A3C on the Atari Breakout environment.

Deep Deterministic Policy Gradient (DDPG)

The DDPG algorithm is a model-free, off-policy algorithm for continuous action spaces. Similarly to A2C, it is an actor-critic algorithm in which the actor is trained on a deterministic target policy, and the critic predicts Q-Values. In order to reduce variance and increase stability, we use experience replay and separate target networks. Moreover, as hinted by OpenAI, we encourage exploration through parameter space noise (as opposed to traditional action space noise). We test DDPG on the Lunar Lander environment.

Running

$ python3 main.py --type A2C --env CartPole-v1
$ python3 main.py --type A3C --env CartPole-v1 --nb_episodes 10000 --n_threads 16
$ python3 main.py --type A3C --env BreakoutNoFrameskip-v4 --is_atari --nb_episodes 10000 --n_threads 16
$ python3 main.py --type DDPG --env LunarLanderContinuous-v2


Deep Q-Learning Algorithms

Double Deep Q-Network (DDQN)

The DQN algorithm is a Q-learning algorithm, which uses a Deep Neural Network as a Q-value function approximator. We estimate target Q-values by leveraging the Bellman equation, and gather experience through an epsilon-greedy policy. For more stability, we sample past experiences randomly (Experience Replay). A variant of the DQN algorithm is the Double-DQN (or DDQN). For a more accurate estimation of our Q-values, we use a second network to temper the overestimations of the Q-values by the original network. This target network is updated at a slower rate Tau, at every training step.

Double Deep Q-Network with Prioritized Experience Replay (DDQN + PER)

We can further improve our DDQN algorithm by adding in Prioritized Experience Replay (PER), which aims at performing importance sampling on the gathered experience. The experience is ranked by its TD-Error, and stored in a SumTree structure, which allows efficient retrieval of the (s, a, r, s') transitions with the highest error.

Dueling Double Deep Q-Network (Dueling DDQN)

In the dueling variant of the DQN, we incorporate an intermediate layer in the Q-Network to estimate both the state value and the state-dependent advantage function. After reformulation (see ref), it turns out we can express the estimated Q-Value as the state value, to which we add the advantage estimate and subtract its mean. This factorization of state-independent and state-dependent values helps disentangling learning across actions and yields better results.

Running

$ python3 main.py --type DDQN --env CartPole-v1 --batch_size 64
$ python3 main.py --type DDQN --env CartPole-v1 --batch_size 64 --with_PER
$ python3 main.py --type DDQN --env CartPole-v1 --batch_size 64 --dueling


Arguments

Argument         Description Values
--type Type of RL Algorithm to run Choose from {A2C, A3C, DDQN, DDPG}
--env Specify the environment BreakoutNoFrameskip-v4 (default)
--nb_episodes Number of episodes to run 5000 (default)
--batch_size Batch Size (DDQN, DDPG) 32 (default)
--consecutive_frames Number of stacked consecutive frames 4 (default)
--is_atari Whether the environment is an Atari Game with pixel input -
--with_PER Whether to use Prioritized Experience Replay (with DDQN) -
--dueling Whether to use Dueling Networks (with DDQN) -
--n_threads Number of threads (A3C) 16 (default)
--gather_stats Whether to compute stats of scores averaged over 10 games (slow, see below) -
--render Whether to render the environment as it is training -
--gpu GPU index 0

Visualization & Monitoring

Model Visualization

All models are saved under <algorithm_folder>/models/ when finished training. You can visualize them running in the same environment they were trained in by running the load_and_run.py script. For DQN models, you should specify the path to the desired model in the --model_path argument. For actor-critic models, you need to specify both weight files in the --actor_path and --critic_path arguments.

Tensorboard monitoring

Using tensorboard, you can monitor the agent's score as it is training. When training, a log folder with the name matching the chosen environment will be created. For example, to follow the A2C progression on CartPole-v1, simply run:

$ tensorboard --logdir=A2C/tensorboard_CartPole-v1/

Results plotting

When training with the argument--gather_stats, a log file is generated containing scores averaged over 10 games at every episode: logs.csv. Using plotly, you can visualize the average reward per episode. To do so, you will first need to install plotly and get a free licence.

pip3 install plotly

To set up your credentials, run:

import plotly
plotly.tools.set_credentials_file(username='<your_username>', api_key='<your_key>')

Finally, to plot the results, run:

python3 utils/plot_results.py <path_to_your_log_file>

Acknowledgments

  • Atari Environment Helper Class template by @ShanHaoYu
  • Atari Environment Wrappers by OpenAI
  • SumTree Helper Class by @jaara

References (Papers)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].