Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → germain-hug → Deep Rl Keras

germain-hug / Deep Rl Keras

Keras Implementation of popular Deep RL Algorithms (A3C, DDQN, DDPG, Dueling DDQN)

Programming Languages

python

139335 projects - #7 most used programming language

Labels

keras reinforcement-learning dqn gym policy-gradient a3c ddpg

Projects that are alternatives of or similar to Deep Rl Keras

Pytorch Rl

This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch

Stars: ✭ 394 (-0.25%)

Mutual labels: gym, reinforcement-learning, dqn, policy-gradient, ddpg

Reinforcement Learning With Tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Stars: ✭ 6,948 (+1658.99%)

Mutual labels: reinforcement-learning, dqn, policy-gradient, a3c, ddpg

Easy Rl

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+660.51%)

Mutual labels: reinforcement-learning, dqn, policy-gradient, a3c, ddpg

Torchrl

Pytorch Implementation of Reinforcement Learning Algorithms ( Soft Actor Critic(SAC)/ DDPG / TD3 /DQN / A2C/ PPO / TRPO)

Stars: ✭ 90 (-77.22%)

Mutual labels: gym, reinforcement-learning, dqn, ddpg

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (-43.8%)

Mutual labels: dqn, policy-gradient, a3c, ddpg

Slm Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

Stars: ✭ 904 (+128.86%)

Mutual labels: reinforcement-learning, dqn, policy-gradient, a3c

Reinforcement learning

Reinforcement learning tutorials

Stars: ✭ 82 (-79.24%)

Mutual labels: reinforcement-learning, dqn, policy-gradient, a3c

Deeprl Tensorflow2

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

Stars: ✭ 319 (-19.24%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Stars: ✭ 145 (-63.29%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Minimalrl

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Stars: ✭ 2,051 (+419.24%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Rlcycle

A library for ready-made reinforcement learning agents and reusable components for neat prototyping

Stars: ✭ 184 (-53.42%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Reinforcement Learning

Minimal and Clean Reinforcement Learning Examples

Stars: ✭ 2,863 (+624.81%)

Mutual labels: reinforcement-learning, dqn, policy-gradient, a3c

Rl algorithms

Structural implementation of RL key algorithms

Stars: ✭ 352 (-10.89%)

Mutual labels: gym, reinforcement-learning, dqn, policy-gradient

Atari

AI research environment for the Atari 2600 games 🤖.

Stars: ✭ 174 (-55.95%)

Mutual labels: gym, reinforcement-learning, dqn

A2c

A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow

Stars: ✭ 169 (-57.22%)

Mutual labels: gym, reinforcement-learning, policy-gradient

Deep Reinforcement Learning

Repo for the Deep Reinforcement Learning Nanodegree program

Stars: ✭ 4,012 (+915.7%)

Mutual labels: reinforcement-learning, dqn, ddpg

reinforcement learning with Tensorflow

Minimal implementations of reinforcement learning algorithms by Tensorflow

Stars: ✭ 28 (-92.91%)

Mutual labels: dqn, a3c, ddpg

Openai lab

An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.

Stars: ✭ 313 (-20.76%)

Mutual labels: reinforcement-learning, policy-gradient, ddpg

Super Mario Bros A3c Pytorch

Asynchronous Advantage Actor-Critic (A3C) algorithm for Super Mario Bros

Stars: ✭ 775 (+96.2%)

Mutual labels: gym, reinforcement-learning, a3c

deep rl acrobot

TensorFlow A2C to solve Acrobot, with synchronized parallel environments

Stars: ✭ 32 (-91.9%)

Mutual labels: policy-gradient, a3c, ddpg

View All Similar Projects ➔

Deep Reinforcement Learning in Keras

Modular Implementation of popular Deep Reinforcement Learning algorithms in Keras:

[x] Synchronous N-step Advantage Actor Critic (A2C)
[x] Asynchronous N-step Advantage Actor-Critic (A3C)
[x] Deep Deterministic Policy Gradient with Parameter Noise (DDPG)
[x] Double Deep Q-Network (DDQN)
[x] Double Deep Q-Network with Prioritized Experience Replay (DDQN + PER)
[x] Dueling DDQN (D3QN)

Getting Started

This implementation requires keras 2.1.6, as well as OpenAI gym.

$ pip install gym keras==2.1.6

Actor-Critic Algorithms

N-step Advantage Actor Critic (A2C)

The Actor-Critic algorithm is a model-free, off-policy method where the critic acts as a value-function approximator, and the actor as a policy-function approximator. When training, the critic predicts the TD-Error and guides the learning of both itself and the actor. In practice, we approximate the TD-Error using the Advantage function. For more stability, we use a shared computational backbone across both networks, as well as an N-step formulation of the discounted rewards. We also incorporate an entropy regularization term ("soft" learning) to encourage exploration. While A2C is simple and efficient, running it on Atari Games quickly becomes intractable due to long computation time.

N-step Asynchronous Advantage Actor Critic (A3C)

In a similar fashion as the A2C algorithm, the implementation of A3C incorporates asynchronous weight updates, allowing for much faster computation. We use multiple agents to perform gradient ascent asynchronously, over multiple threads. We test A3C on the Atari Breakout environment.

Deep Deterministic Policy Gradient (DDPG)

The DDPG algorithm is a model-free, off-policy algorithm for continuous action spaces. Similarly to A2C, it is an actor-critic algorithm in which the actor is trained on a deterministic target policy, and the critic predicts Q-Values. In order to reduce variance and increase stability, we use experience replay and separate target networks. Moreover, as hinted by OpenAI, we encourage exploration through parameter space noise (as opposed to traditional action space noise). We test DDPG on the Lunar Lander environment.

Running

$ python3 main.py --type A2C --env CartPole-v1
$ python3 main.py --type A3C --env CartPole-v1 --nb_episodes 10000 --n_threads 16
$ python3 main.py --type A3C --env BreakoutNoFrameskip-v4 --is_atari --nb_episodes 10000 --n_threads 16
$ python3 main.py --type DDPG --env LunarLanderContinuous-v2

Deep Q-Learning Algorithms

Double Deep Q-Network (DDQN)

The DQN algorithm is a Q-learning algorithm, which uses a Deep Neural Network as a Q-value function approximator. We estimate target Q-values by leveraging the Bellman equation, and gather experience through an epsilon-greedy policy. For more stability, we sample past experiences randomly (Experience Replay). A variant of the DQN algorithm is the Double-DQN (or DDQN). For a more accurate estimation of our Q-values, we use a second network to temper the overestimations of the Q-values by the original network. This target network is updated at a slower rate Tau, at every training step.

Double Deep Q-Network with Prioritized Experience Replay (DDQN + PER)

We can further improve our DDQN algorithm by adding in Prioritized Experience Replay (PER), which aims at performing importance sampling on the gathered experience. The experience is ranked by its TD-Error, and stored in a SumTree structure, which allows efficient retrieval of the (s, a, r, s') transitions with the highest error.

Dueling Double Deep Q-Network (Dueling DDQN)

In the dueling variant of the DQN, we incorporate an intermediate layer in the Q-Network to estimate both the state value and the state-dependent advantage function. After reformulation (see ref), it turns out we can express the estimated Q-Value as the state value, to which we add the advantage estimate and subtract its mean. This factorization of state-independent and state-dependent values helps disentangling learning across actions and yields better results.

Running

$ python3 main.py --type DDQN --env CartPole-v1 --batch_size 64
$ python3 main.py --type DDQN --env CartPole-v1 --batch_size 64 --with_PER
$ python3 main.py --type DDQN --env CartPole-v1 --batch_size 64 --dueling

Arguments

Argument	Description	Values
--type	Type of RL Algorithm to run	Choose from {A2C, A3C, DDQN, DDPG}
--env	Specify the environment	BreakoutNoFrameskip-v4 (default)
--nb_episodes	Number of episodes to run	5000 (default)
--batch_size	Batch Size (DDQN, DDPG)	32 (default)
--consecutive_frames	Number of stacked consecutive frames	4 (default)
--is_atari	Whether the environment is an Atari Game with pixel input	-
--with_PER	Whether to use Prioritized Experience Replay (with DDQN)	-
--dueling	Whether to use Dueling Networks (with DDQN)	-
--n_threads	Number of threads (A3C)	16 (default)
--gather_stats	Whether to compute stats of scores averaged over 10 games (slow, see below)	-
--render	Whether to render the environment as it is training	-
--gpu	GPU index	0

Visualization & Monitoring

Model Visualization

All models are saved under <algorithm_folder>/models/ when finished training. You can visualize them running in the same environment they were trained in by running the load_and_run.py script. For DQN models, you should specify the path to the desired model in the --model_path argument. For actor-critic models, you need to specify both weight files in the --actor_path and --critic_path arguments.

Tensorboard monitoring

Using tensorboard, you can monitor the agent's score as it is training. When training, a log folder with the name matching the chosen environment will be created. For example, to follow the A2C progression on CartPole-v1, simply run:

$ tensorboard --logdir=A2C/tensorboard_CartPole-v1/

Results plotting

When training with the argument--gather_stats, a log file is generated containing scores averaged over 10 games at every episode: logs.csv. Using plotly, you can visualize the average reward per episode. To do so, you will first need to install plotly and get a free licence.

pip3 install plotly

To set up your credentials, run:

import plotly
plotly.tools.set_credentials_file(username='<your_username>', api_key='<your_key>')

Finally, to plot the results, run:

python3 utils/plot_results.py <path_to_your_log_file>

Acknowledgments

Atari Environment Helper Class template by @ShanHaoYu
Atari Environment Wrappers by OpenAI
SumTree Helper Class by @jaara

References (Papers)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 395

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (14) 🔗