All Projects → illidanlab → rpg

illidanlab / rpg

Licence: other
Ranking Policy Gradient

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to rpg

Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+13554.55%)
Mutual labels:  policy-gradient, imitation-learning
imitation learning
PyTorch implementation of some reinforcement learning algorithms: A2C, PPO, Behavioral Cloning from Observation (BCO), GAIL.
Stars: ✭ 93 (+322.73%)
Mutual labels:  policy-gradient, imitation-learning
Tianshou
An elegant PyTorch deep reinforcement learning library.
Stars: ✭ 4,109 (+18577.27%)
Mutual labels:  policy-gradient, imitation-learning
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (+1909.09%)
Mutual labels:  policy-gradient, imitation-learning
A2c
A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow
Stars: ✭ 169 (+668.18%)
Mutual labels:  policy-gradient
Deep Reinforcement Learning With Pytorch
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Stars: ✭ 1,345 (+6013.64%)
Mutual labels:  policy-gradient
Reinforcement learning
Reinforcement learning tutorials
Stars: ✭ 82 (+272.73%)
Mutual labels:  policy-gradient
Rl Course Experiments
Stars: ✭ 73 (+231.82%)
Mutual labels:  policy-gradient
Pontryagin-Differentiable-Programming
A unified end-to-end learning and control framework that is able to learn a (neural) control objective function, dynamics equation, control policy, or/and optimal trajectory in a control system.
Stars: ✭ 111 (+404.55%)
Mutual labels:  imitation-learning
Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+12913.64%)
Mutual labels:  policy-gradient
Policy Gradient
Minimal Monte Carlo Policy Gradient (REINFORCE) Algorithm Implementation in Keras
Stars: ✭ 135 (+513.64%)
Mutual labels:  policy-gradient
Reinforcement learning
강화학습에 대한 기본적인 알고리즘 구현
Stars: ✭ 100 (+354.55%)
Mutual labels:  policy-gradient
Deep Algotrading
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading
Stars: ✭ 173 (+686.36%)
Mutual labels:  policy-gradient
Deeprl algorithms
DeepRL algorithms implementation easy for understanding and reading with Pytorch and Tensorflow 2(DQN, REINFORCE, VPG, A2C, TRPO, PPO, DDPG, TD3, SAC)
Stars: ✭ 97 (+340.91%)
Mutual labels:  policy-gradient
SharkStock
Automate swing trading using deep reinforcement learning. The deep deterministic policy gradient-based neural network model trains to choose an action to sell, buy, or hold the stocks to maximize the gain in asset value. The paper also acknowledges the need for a system that predicts the trend in stock value to work along with the reinforcement …
Stars: ✭ 63 (+186.36%)
Mutual labels:  policy-gradient
Codegan
[Deprecated] Source Code Generation using Sequence Generative Adversarial Networks
Stars: ✭ 73 (+231.82%)
Mutual labels:  policy-gradient
Mlds2018spring
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
Stars: ✭ 124 (+463.64%)
Mutual labels:  policy-gradient
Pytorch Rl
Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]
Stars: ✭ 121 (+450%)
Mutual labels:  policy-gradient
Show Adapt And Tell
Code for "Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner" in ICCV 2017
Stars: ✭ 146 (+563.64%)
Mutual labels:  policy-gradient
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+909.09%)
Mutual labels:  policy-gradient

Ranking Policy Gradient

Ranking Policy Gradient (RPG) is a sample-efficient off-policy policy gradient method that learns optimal ranking of actions to maximize the return. RPG has the following practical advantages:

  • It is a sample-efficient model-free algorithm for learning deterministic policies.
  • It is effortless to incorporate any exploration algorithm to improve the sample-efficiency of RPG further.

This codebase contains the implementation of RPG using the dopamine framework. The preprint of the RPG paper is available here.

Instructions

Install via source

Step 1.

Follow the install instruction of dopamine framework for Ubuntu or Max OS X.

Step 2.

Download the RPG source, i.e.

git clone [email protected]:illidanlab/rpg.git

Running the tests

cd ./rpg/dopamine 
python -um dopamine.atari.train \
  --agent_name=rpg \
  --base_dir=/tmp/dopamine \
  --random_seed 1 \
  --game_name=Pong \
  --gin_files='dopamine/agents/rpg/configs/rpg.gin'

Reproduce

To reproduce the results in the paper, please refer to the instruction in here.

Reference

If you use this RPG implementation in your work, please consider citing the following papers:

@article{lin2019ranking,
  title={Ranking Policy Gradient},
  author={Lin, Kaixiang and Zhou, Jiayu},
  journal={arXiv preprint arXiv:1906.09674},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].