31 projects in the framework of Deep Reinforcement Learning algorithms: Q-learning, DQN, PPO, DDPG, TD3, SAC, A2C and others. Each project is provided with a detailed training log.

Stars: ✭ 167 (+279.55%)

Mutual labels: deep-reinforcement-learning, ddpg

Machine Learning Is All You Need

🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!

Stars: ✭ 173 (+293.18%)

Mutual labels: deep-reinforcement-learning, ddpg

reinforcement learning ppo rnd

Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch with some explanation

Stars: ✭ 33 (-25%)

Mutual labels: deep-reinforcement-learning, gym

View All Similar Projects ➔

Wolpertinger Training with DDPG (Pytorch, Multi-GPU/single-GPU/CPU)

Overview

Pytorch version of Wolpertinger Training with DDPG (paper: Deep Reinforcement Learning in Large Discrete Action Spaces).
The code is compatible with training in multi-GPU, single-GPU or CPU.
It is also compatible with both continuous and discrete control of OpenAI gym.
In continuous case, I discretize the action space to use wolpertinger-DDPG training algorithm.

Dependencies

python 3.6.8
torch 1.1.0
OpenAI gym
- If you get an RunTimeError:NotImplementedError in ActionWrapper.step while training with gym, replace your gym/core.py file with core.py in openai/gym.
pyflann
- This is the library (FLANN, Muja & Lowe, 2014) with approximate nearest-neighbor methods allowed for logarithmic-time lookup complexity relative to the number of actions. However, the python binding of FLANN (pyflann) is written for python 2 and is no longer maintained.
- To use this package, please put the whole directory pyflann into your (virtual) python environment.
- Please refer to pyflann for a more detailed instruction if needed.

Usage

In Pendulum-v0 (continuous control), discretize the continuous action space to a discrete action spaces with 200000 actions.
```
python main.py --env 'Pendulum-v0' --max-actions 200000
```
In CartPole-v1 (discrete control), --max-actions is not needed.
```
python main.py --env 'CartPole-v1'
```
To use CPU only:
```
python main.py --gpu-ids -1
```
To use single-GPU only:
```
python main.py --gpu-ids 0 --gpu-nums 1
```
To use multi-GPU (e.g., use GPU-0 and GPU-1):
```
python main.py --gpu-ids 0 1 --gpu-nums 2
```

Result

Please refer to output for the trained models and training log.
- Pendulum-v0: a gym environment with continuous action space.
- CartPole-v1: a gym environment with discrete action space.

Project Reference

Original paper of Wolpertinger Training with DDPG, Google DeepMind
I used and modified part of the code in https://github.com/ghliu/pytorch-ddpg under Apache License 2.0.
I used and modified part of the code in https://github.com/jimkon/Deep-Reinforcement-Learning-in-Large-Discrete-Action-Spaces under MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ChangyWen / wolpertinger_ddpg

Programming Languages

Labels