All Projects → quantumiracle → Popular Rl Algorithms

quantumiracle / Popular Rl Algorithms

Licence: apache-2.0
PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt, PointNet..

Projects that are alternatives of or similar to Popular Rl Algorithms

Deeprl Agents
A set of Deep Reinforcement Learning Agents implemented in Tensorflow.
Stars: ✭ 2,149 (+707.89%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Rl Tutorial Jnrr19
Stable-Baselines tutorial for Journées Nationales de la Recherche en Robotique 2019
Stars: ✭ 204 (-23.31%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Release
Deep Reinforcement Learning for de-novo Drug Design
Stars: ✭ 201 (-24.44%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Machine Learning And Reinforcement Learning In Finance
Machine Learning and Reinforcement Learning in Finance New York University Tandon School of Engineering
Stars: ✭ 173 (-34.96%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Nn
🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+2050.38%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Deep Algotrading
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading
Stars: ✭ 173 (-34.96%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Multihopkg
Multi-hop knowledge graph reasoning learned via policy gradient with reward shaping and action dropout
Stars: ✭ 202 (-24.06%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Data Science Question Answer
A repo for data science related questions and answers
Stars: ✭ 2,000 (+651.88%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Applied Reinforcement Learning
Reinforcement Learning and Decision Making tutorials explained at an intuitive level and with Jupyter Notebooks
Stars: ✭ 229 (-13.91%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Machine Learning Notebooks
Machine Learning notebooks for refreshing concepts.
Stars: ✭ 222 (-16.54%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (-34.59%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Aleph star
Reinforcement learning with A* and a deep heuristic
Stars: ✭ 235 (-11.65%)
Mutual labels:  jupyter-notebook, reinforcement-learning
2048 Deep Reinforcement Learning
Trained A Convolutional Neural Network To Play 2048 using Deep-Reinforcement Learning
Stars: ✭ 169 (-36.47%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Andrew Ng Notes
This is Andrew NG Coursera Handwritten Notes.
Stars: ✭ 180 (-32.33%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Chess Alpha Zero
Chess reinforcement learning by AlphaGo Zero methods.
Stars: ✭ 1,868 (+602.26%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Alpha Zero General
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Stars: ✭ 2,617 (+883.83%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Rl Quadcopter
Teach a Quadcopter How to Fly!
Stars: ✭ 124 (-53.38%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Modular Rl
[ICML 2020] PyTorch Code for "One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control"
Stars: ✭ 126 (-52.63%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Icychesszero
中国象棋alpha zero程序
Stars: ✭ 206 (-22.56%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Rl learn
我的强化学习笔记和学习材料📖 still updating ... ...
Stars: ✭ 234 (-12.03%)
Mutual labels:  jupyter-notebook, reinforcement-learning

State-of-the-art Model-free Reinforcement Learning Algorithms Tweet

PyTorch and Tensorflow 2.0 implementation of state-of-the-art model-free reinforcement learning algorithms on both Openai gym environments and a self-implemented Reacher environment.

Algorithms include Soft Actor-Critic (SAC), Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt (including Cross-entropy (CE) Method), PointNet, Transporter, Recurrent Policy Gradient, Soft Decision Tree, etc.

Please note that this repo is more of a personal collection of algorithms I implemented and tested during my research and study period, rather than an official open-source library/package for usage. However, I think it could be helpful to share it with others and I'm expecting useful discussions on my implementations. But I didn't spend much time on cleaning or structuring the code. As you may notice that there may be several versions of implementation for each algorithm, I intentionally show all of them here for you to refer and compare. Also, this repo contains only PyTorch Implementation.

For official libraries of RL algorithms, I provided the following two with TensorFlow 2.0 + TensorLayer 2.0:

  • RL Tutorial (Status: Released) contains RL algorithms implementation as tutorials with simple structures.

  • RLzoo (Status: Released) is a baseline implementation with high-level API supporting a variety of popular environments, with more hierarchical structures for simple usage.

Since Tensorflow 2.0 has already incorporated the dynamic graph construction instead of the static one, it becomes a trivial work to transfer the RL code between TensorFlow and PyTorch.

Contents:

Usage:

python ***.py --train

python ***.py --test

Troubleshooting:

If you meet problem "Not imlplemented Error", it may be due to the wrong gym version. The newest gym==0.14 won't work. Install gym==0.7 or gym==0.10 with pip install -r requirements.txt.

Performance:

  • SAC for gym Pendulum-v0:

SAC with automatically updating variable alpha for entropy:

SAC without automatically updating variable alpha for entropy:

It shows that the automatic-entropy update helps the agent to learn faster.

  • TD3 for gym Pendulum-v0:

TD3 with deterministic policy:

TD3 with non-deterministic/stochastic policy:

It seems TD3 with deterministic policy works a little better, but basically similar.

  • AC for gym CartPole-v0:

However, vanilla AC/A2C cannot handle the continuous case like gym Pendulum-v0 well.

Citation:

To cite this repository:

@misc{rlalgorithms,
  author = {Zihan Ding},
  title = {Popular-RL-Algorithms},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/quantumiracle/Popular-RL-Algorithms}},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].