DQN(λ) — Reconciling λ-Returns with Experience Replay

DQN(λ) is an instantiation of the ideas proposed in [1] that extends DQN [2] to efficiently utilize various types of λ-returns [3]. These can significantly improve sample efficiency.

If you use this repository in published work, please cite the paper:

@inproceedings{daley2019reconciling,
  title={Reconciling $\lambda$-Returns with Experience Replay},
  author={Daley, Brett and Amato, Christopher},
  booktitle={Advances in Neural Information Processing Systems},
  pages={1133--1142},
  year={2019}
}

Setup

Quickstart: DQN(λ)

Quickstart: DQN

Atari Environment Naming Convention

Return Estimators

License, Acknowledgments, and References

Setup

This repository requires Python 3. To automatically install working package versions, just clone the repository and run pip:

git clone https://github.com/brett-daley/dqn-lambda.git
cd dqn-lambda
pip install -r requirements.txt

Note: Training will likely be impractical without GPU support. See this TensorFlow guide for tensorflow-gpu and CUDA setup.

Quickstart: DQN(λ)

Atari Games

You can train DQN(λ) on any of the Atari games included in the OpenAI Gym (see Atari Environment Naming Convention). For example, the following command runs DQN(λ) with λ=0.75 on Pong for 1.5 million timesteps:

python run_dqn_atari.py --env pong --return-est pengs-0.75 --timesteps 1.5e6

See Return Estimators for all of the n-step returns and λ-returns supported by --return-est. To get a description of the other possible command-line arguments, run this:

python run_dqn_atari.py --help

Classic Control Environments

You can run DQN(λ) on CartPole-v0 by simply executing python run_dqn_control.py. This is useful to test code on laptops or low-end desktops — particularly those without GPUs.

run_dqn_control.py does not take command-line arguments; all values are hard-coded. You need to edit the file directly to change parameters. A one-line change to the environment name is all you need to run other environments (discrete action spaces only; e.g. Acrobot-v1 or MountainCar-v0).

Quickstart: DQN

This repository also includes a standard target-network implementation of DQN for reference. Add the --legacy flag to run it instead of DQN(λ):

python run_dqn_atari.py --legacy

Note that setting --legacy along with any DQN(λ)-specific arguments (--cache-size, --block-size, or --priority) will throw an error because they are undefined for DQN. For example:

python run_dqn_atari.py --cache-size 10000 --legacy

Traceback (most recent call last):
  File "run_dqn_atari.py", line 82, in <module>
    main()
  File "run_dqn_atari.py", line 56, in main
    assert args.cache_size == 80000  # Cache-related args are undefined for legacy DQN
AssertionError

Similarly, trying to use --legacy with a return estimator other than n-step returns will also throw an error:

python run_dqn_atari.py --return-est pengs-0.75 --legacy

Traceback (most recent call last):
  File "run_dqn_atari.py", line 82, in <module>
    main()
  File "run_dqn_atari.py", line 59, in main
    replay_memory = make_legacy_replay_memory(args.return_est, replay_mem_size, args.history_len, discount)
  File "/home/brett/dqn-lambda/replay_memory_legacy.py", line 10, in make_legacy_replay_memory
    raise ValueError('Legacy mode only supports n-step returns but requested {}'.format(return_est))
ValueError: Legacy mode only supports n-step returns but requested pengs-0.75

Atari Environment Naming Convention

The --env argument does not use the same string format that OpenAI Gym uses. Environment names should be lowercase and use underscores instead of CamelCase. The trailing -v0 should also be removed. For example:

OpenAI Name	Usage
BeamRider-v0	`python run_dqn_atari.py --env beam_rider`
Breakout-v0	`python run_dqn_atari.py --env breakout`
Pong-v0	`python run_dqn_atari.py --env pong`
Qbert-v0	`python run_dqn_atari.py --env qbert`
Seaquest-v0	`python run_dqn_atari.py --env seaquest`
SpaceInvaders-v0	`python run_dqn_atari.py --env space_invaders`

This pattern applies to all of the Atari games supported by OpenAI Gym.

Return Estimators

The --return-est argument accepts a string that determines which return estimator should be used. The estimator might be parameterized by an <int> (greater than 0) or a <float> (between 0.0 and 1.0 (inclusive) — decimal point mandatory). The table below summarizes all of the possible return estimators supported by DQN(λ).

Return Estimator	Format	Example	Description
n-step	`nstep-<int>`	`nstep-3`	Classic n-step return [3]. Standard DQN uses n=1. n=`<int>`
Peng's Q(λ)	`pengs-<float>`	`pengs-0.75`	λ-return, unconditionally uses max Q-values [4]. A good "default" λ-return. λ=`<float>`
Peng's Q(λ) + median	`pengs-median`	`pengs-median`	Peng's Q(λ) + median λ selection [1].
Peng's Q(λ) + bounded 𝛿	`pengs-maxtd-<float>`	`pengs-maxtd-0.01`	Peng's Q(λ) + bounded-error λ selection [1]. 𝛿=`<float>`
Watkin's Q(λ)	`watkins-<float>`	`watkins-0.75`	Peng's Q(λ), but sets λ=0 if Q-value is non-max [4]. Ensures on-policy data. λ=`<float>`
Watkin's Q(λ) + median	`watkins-median`	`watkins-median`	Watkin's Q(λ) + median λ selection [1].
Watkin's Q(λ) + bounded 𝛿	`watkins-maxtd-<float>`	`watkins-maxtd-0.01`	Watkin's Q(λ) + bounded-error λ selection [1]. 𝛿=`<float>`

See chapter 7.6 of [4] for a side-by-side comparison of Peng's Q(λ) and Watkin's Q(λ).

License

This code is released under the MIT License.

Acknowledgments

This codebase evolved from the partial DQN implementation made available by the Berkeley Deep RL course, in turn based on Szymon Sidor's OpenAI implementation. Special thanks to them.