All Projects → brett-daley → dqn-lambda

brett-daley / dqn-lambda

Licence: MIT license
NeurIPS 2019: DQN(λ) = Deep Q-Network + λ-returns.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to dqn-lambda

Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (+3100%)
Mutual labels:  deep-reinforcement-learning, openai-gym, deep-q-network
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+1010%)
Mutual labels:  deep-reinforcement-learning, openai-gym
good robot
"Good Robot! Now Watch This!": Repurposing Reinforcement Learning for Task-to-Task Transfer; and “Good Robot!”: Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer
Stars: ✭ 84 (+320%)
Mutual labels:  deep-reinforcement-learning, deep-q-network
Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020
Live Trading. Please star.
Stars: ✭ 1,251 (+6155%)
Mutual labels:  deep-reinforcement-learning, openai-gym
yarll
Combining deep learning and reinforcement learning.
Stars: ✭ 84 (+320%)
Mutual labels:  deep-reinforcement-learning, openai-gym
king-pong
Deep Reinforcement Learning Pong Agent, King Pong, he's the best
Stars: ✭ 23 (+15%)
Mutual labels:  deep-reinforcement-learning, deep-q-network
deep rl acrobot
TensorFlow A2C to solve Acrobot, with synchronized parallel environments
Stars: ✭ 32 (+60%)
Mutual labels:  deep-reinforcement-learning, openai-gym
Reinforcementlearning.jl
A reinforcement learning package for Julia
Stars: ✭ 192 (+860%)
Mutual labels:  deep-reinforcement-learning, deep-q-network
CartPole
Run OpenAI Gym on a Server
Stars: ✭ 16 (-20%)
Mutual labels:  openai-gym, cartpole
Deep-Quality-Value-Family
Official implementation of the paper "Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning Algorithms": https://arxiv.org/abs/1909.01779 To appear at the next NeurIPS2019 DRL-Workshop
Stars: ✭ 12 (-40%)
Mutual labels:  deep-reinforcement-learning, atari-2600
drl grasping
Deep Reinforcement Learning for Robotic Grasping from Octrees
Stars: ✭ 160 (+700%)
Mutual labels:  deep-reinforcement-learning, openai-gym
Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+14215%)
Mutual labels:  deep-reinforcement-learning, deep-q-network
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (+1065%)
Mutual labels:  deep-reinforcement-learning, deep-q-network
DQN-using-PyTorch-and-ML-Agents
A simple example of how to implement vector based DQN using PyTorch and a ML-Agents environment
Stars: ✭ 81 (+305%)
Mutual labels:  deep-reinforcement-learning, deep-q-network
Deep Reinforcement Learning Gym
Deep reinforcement learning model implementation in Tensorflow + OpenAI gym
Stars: ✭ 200 (+900%)
Mutual labels:  deep-reinforcement-learning, openai-gym
a3c-super-mario-pytorch
Reinforcement Learning for Super Mario Bros using A3C on GPU
Stars: ✭ 35 (+75%)
Mutual labels:  deep-reinforcement-learning, openai-gym
Deep-Reinforcement-Learning-CS285-Pytorch
Solutions of assignments of Deep Reinforcement Learning course presented by the University of California, Berkeley (CS285) in Pytorch framework
Stars: ✭ 104 (+420%)
Mutual labels:  deep-reinforcement-learning, openai-gym
2048 Deep Reinforcement Learning
Trained A Convolutional Neural Network To Play 2048 using Deep-Reinforcement Learning
Stars: ✭ 169 (+745%)
Mutual labels:  deep-reinforcement-learning, deep-q-network
Hands On Intelligent Agents With Openai Gym
Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
Stars: ✭ 189 (+845%)
Mutual labels:  deep-reinforcement-learning, openai-gym
FinRL Podracer
Cloud-native Financial Reinforcement Learning
Stars: ✭ 179 (+795%)
Mutual labels:  deep-reinforcement-learning, openai-gym

DQN(λ) — Reconciling λ-Returns with Experience Replay

DQN(λ) is an instantiation of the ideas proposed in [1] that extends DQN [2] to efficiently utilize various types of λ-returns [3]. These can significantly improve sample efficiency.

If you use this repository in published work, please cite the paper:

@inproceedings{daley2019reconciling,
  title={Reconciling $\lambda$-Returns with Experience Replay},
  author={Daley, Brett and Amato, Christopher},
  booktitle={Advances in Neural Information Processing Systems},
  pages={1133--1142},
  year={2019}
}

Contents

Setup

Quickstart: DQN(λ)

Quickstart: DQN

Atari Environment Naming Convention

Return Estimators

License, Acknowledgments, and References


Setup

This repository requires Python 3. To automatically install working package versions, just clone the repository and run pip:

git clone https://github.com/brett-daley/dqn-lambda.git
cd dqn-lambda
pip install -r requirements.txt

Note: Training will likely be impractical without GPU support. See this TensorFlow guide for tensorflow-gpu and CUDA setup.


Quickstart: DQN(λ)

Atari Games

You can train DQN(λ) on any of the Atari games included in the OpenAI Gym (see Atari Environment Naming Convention). For example, the following command runs DQN(λ) with λ=0.75 on Pong for 1.5 million timesteps:

python run_dqn_atari.py --env pong --return-est pengs-0.75 --timesteps 1.5e6

See Return Estimators for all of the n-step returns and λ-returns supported by --return-est. To get a description of the other possible command-line arguments, run this:

python run_dqn_atari.py --help

Classic Control Environments

You can run DQN(λ) on CartPole-v0 by simply executing python run_dqn_control.py. This is useful to test code on laptops or low-end desktops — particularly those without GPUs.

run_dqn_control.py does not take command-line arguments; all values are hard-coded. You need to edit the file directly to change parameters. A one-line change to the environment name is all you need to run other environments (discrete action spaces only; e.g. Acrobot-v1 or MountainCar-v0).


Quickstart: DQN

This repository also includes a standard target-network implementation of DQN for reference. Add the --legacy flag to run it instead of DQN(λ):

python run_dqn_atari.py --legacy

Note that setting --legacy along with any DQN(λ)-specific arguments (--cache-size, --block-size, or --priority) will throw an error because they are undefined for DQN. For example:

python run_dqn_atari.py --cache-size 10000 --legacy

Traceback (most recent call last):
  File "run_dqn_atari.py", line 82, in <module>
    main()
  File "run_dqn_atari.py", line 56, in main
    assert args.cache_size == 80000  # Cache-related args are undefined for legacy DQN
AssertionError

Similarly, trying to use --legacy with a return estimator other than n-step returns will also throw an error:

python run_dqn_atari.py --return-est pengs-0.75 --legacy

Traceback (most recent call last):
  File "run_dqn_atari.py", line 82, in <module>
    main()
  File "run_dqn_atari.py", line 59, in main
    replay_memory = make_legacy_replay_memory(args.return_est, replay_mem_size, args.history_len, discount)
  File "/home/brett/dqn-lambda/replay_memory_legacy.py", line 10, in make_legacy_replay_memory
    raise ValueError('Legacy mode only supports n-step returns but requested {}'.format(return_est))
ValueError: Legacy mode only supports n-step returns but requested pengs-0.75

Atari Environment Naming Convention

The --env argument does not use the same string format that OpenAI Gym uses. Environment names should be lowercase and use underscores instead of CamelCase. The trailing -v0 should also be removed. For example:

OpenAI Name Usage
BeamRider-v0 python run_dqn_atari.py --env beam_rider
Breakout-v0 python run_dqn_atari.py --env breakout
Pong-v0 python run_dqn_atari.py --env pong
Qbert-v0 python run_dqn_atari.py --env qbert
Seaquest-v0 python run_dqn_atari.py --env seaquest
SpaceInvaders-v0 python run_dqn_atari.py --env space_invaders

This pattern applies to all of the Atari games supported by OpenAI Gym.


Return Estimators

The --return-est argument accepts a string that determines which return estimator should be used. The estimator might be parameterized by an <int> (greater than 0) or a <float> (between 0.0 and 1.0 (inclusive) — decimal point mandatory). The table below summarizes all of the possible return estimators supported by DQN(λ).

Return Estimator Format Example Description
n-step nstep-<int> nstep-3 Classic n-step return [3].
Standard DQN uses n=1.
n=<int>
Peng's Q(λ) pengs-<float> pengs-0.75 λ-return, unconditionally uses
max Q-values [4].
A good "default" λ-return.
λ=<float>
Peng's Q(λ)
+ median
pengs-median pengs-median Peng's Q(λ)
+ median λ selection [1].
Peng's Q(λ)
+ bounded 𝛿
pengs-maxtd-<float> pengs-maxtd-0.01 Peng's Q(λ)
+ bounded-error λ selection [1].
𝛿=<float>
Watkin's Q(λ) watkins-<float> watkins-0.75 Peng's Q(λ), but sets λ=0
if Q-value is non-max [4].
Ensures on-policy data.
λ=<float>
Watkin's Q(λ)
+ median
watkins-median watkins-median Watkin's Q(λ)
+ median λ selection [1].
Watkin's Q(λ)
+ bounded 𝛿
watkins-maxtd-<float> watkins-maxtd-0.01 Watkin's Q(λ)
+ bounded-error λ selection [1].
𝛿=<float>

See chapter 7.6 of [4] for a side-by-side comparison of Peng's Q(λ) and Watkin's Q(λ).


License

This code is released under the MIT License.

Acknowledgments

This codebase evolved from the partial DQN implementation made available by the Berkeley Deep RL course, in turn based on Szymon Sidor's OpenAI implementation. Special thanks to them.

References

[1] Reconciling λ-Returns with Experience Replay

[2] Human-Level Control Through Deep Reinforcement Learning

[3] Reinforcement Learning: An Introduction (2nd edition)

[4] Reinforcement Learning: An Introduction (1st edition)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].