Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → sweetice → Deep Reinforcement Learning With Pytorch

sweetice / Deep Reinforcement Learning With Pytorch

Licence: mit

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning pytorch algorithm deep-reinforcement-learning resnet dqn ppo policy-gradient actor-critic a3c trpo

Projects that are alternatives of or similar to Deep Reinforcement Learning With Pytorch

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (-83.49%)

Mutual labels: deep-reinforcement-learning, dqn, policy-gradient, a3c, actor-critic, trpo, ppo

Reinforcement Learning Algorithms

This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)

Stars: ✭ 426 (-68.33%)

Mutual labels: algorithm, deep-reinforcement-learning, dqn, ppo, actor-critic, trpo

Machine Learning Is All You Need

🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!

Stars: ✭ 173 (-87.14%)

Mutual labels: deep-reinforcement-learning, resnet, dqn, ppo, actor-critic, trpo

Explorer

Explorer is a PyTorch reinforcement learning framework for exploring new ideas.

Stars: ✭ 54 (-95.99%)

Mutual labels: deep-reinforcement-learning, dqn, policy-gradient, actor-critic, ppo

Slm Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

Stars: ✭ 904 (-32.79%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, policy-gradient, a3c

Deeprl Tensorflow2

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

Stars: ✭ 319 (-76.28%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, a3c, trpo

Reinforcement Learning

Minimal and Clean Reinforcement Learning Examples

Stars: ✭ 2,863 (+112.86%)

Mutual labels: deep-reinforcement-learning, dqn, policy-gradient, actor-critic, a3c

Reinforcement Learning With Tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Stars: ✭ 6,948 (+416.58%)

Mutual labels: dqn, ppo, policy-gradient, actor-critic, a3c

Pytorch Rl

Deep Reinforcement Learning with pytorch & visdom

Stars: ✭ 745 (-44.61%)

Mutual labels: deep-reinforcement-learning, dqn, actor-critic, a3c, trpo

Easy Rl

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+123.35%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, policy-gradient, a3c

Deeprl algorithms

DeepRL algorithms implementation easy for understanding and reading with Pytorch and Tensorflow 2(DQN, REINFORCE, VPG, A2C, TRPO, PPO, DDPG, TD3, SAC)

Stars: ✭ 97 (-92.79%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, policy-gradient, trpo

rl implementations

No description or website provided.

Stars: ✭ 40 (-97.03%)

Mutual labels: deep-reinforcement-learning, dqn, policy-gradient, actor-critic

Pytorch Rl

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

Stars: ✭ 658 (-51.08%)

Mutual labels: deep-reinforcement-learning, ppo, policy-gradient, trpo

Pytorch Drl

PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.

Stars: ✭ 233 (-82.68%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, actor-critic

Reinforcement learning tutorial with demo

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

Stars: ✭ 442 (-67.14%)

Mutual labels: deep-reinforcement-learning, policy-gradient, actor-critic, a3c

Deep-Reinforcement-Learning-Notebooks

This Repository contains a series of google colab notebooks which I created to help people dive into deep reinforcement learning.This notebooks contain both theory and implementation of different algorithms.

Stars: ✭ 15 (-98.88%)

Mutual labels: deep-reinforcement-learning, dqn, a3c, ppo

Torch Ac

Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO

Stars: ✭ 70 (-94.8%)

Mutual labels: deep-reinforcement-learning, ppo, actor-critic, a3c

Hands On Reinforcement Learning With Python

Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow

Stars: ✭ 640 (-52.42%)

Mutual labels: deep-reinforcement-learning, ppo, policy-gradient, trpo

Minimalrl

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Stars: ✭ 2,051 (+52.49%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, a3c

Reinforcement learning

Reinforcement learning tutorials

Stars: ✭ 82 (-93.9%)

Mutual labels: dqn, ppo, policy-gradient, a3c

View All Similar Projects ➔

Status: Active (under active development, breaking changes may occur)

This repository will implement the classic and state-of-the-art deep reinforcement learning algorithms. The aim of this repository is to provide clear pytorch code for people to learn the deep reinforcement learning algorithm.

In the future, more state-of-the-art algorithms will be added and the existing codes will also be maintained.

Requirements

python <=3.6
tensorboardX
gym >= 0.10
pytorch >= 0.4

Note that tensorflow does not support python3.7

Installation

pip install -r requirements.txt

If you fail:

Install gym

pip install gym

Install the pytorch

please go to official webisite to install it: https://pytorch.org/

Recommend use Anaconda Virtual Environment to manage your packages

Install tensorboardX

pip install tensorboardX
pip install tensorflow==1.12

Test

cd Char10\ TD3/
python TD3_BipedalWalker-v2.py --mode test

You could see a bipedalwalker if you install successfully.

BipedalWalker:

1. install openai-baselines (Optional)

# clone the openai baselines
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .

DQN

Here I uploaded two DQN models which is trianing CartPole-v0 and MountainCar-v0.

Tips for MountainCar-v0

This is a sparse binary reward task. Only when car reach the top of the mountain there is a none-zero reward. In genearal it may take 1e5 steps in stochastic policy. You can add a reward term, for example, to change to the current position of the Car is positively related. Of course, there is a more advanced approach that is inverse reinforcement learning.

This is value loss for DQN, We can see that the loss increaded to 1e13, however, the network work well. Because the target_net and act_net are very different with the training process going on. The calculated loss cumulate large. The previous loss was small because the reward was very sparse, resulting in a small update of the two networks.

Papers Related to the DQN

Playing Atari with Deep Reinforcement Learning [arxiv] [code]
Deep Reinforcement Learning with Double Q-learning [arxiv] [code]
Dueling Network Architectures for Deep Reinforcement Learning [arxiv] [code]
Prioritized Experience Replay [arxiv] [code]
Noisy Networks for Exploration [arxiv] [code]
A Distributional Perspective on Reinforcement Learning [arxiv] [code]
Rainbow: Combining Improvements in Deep Reinforcement Learning [arxiv] [code]
Distributional Reinforcement Learning with Quantile Regression [arxiv] [code]
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation [arxiv] [code]
Neural Episodic Control [code]

Policy Gradient

Use the following command to run a saved model

python Run_Model.py

Use the following command to train model

python pytorch_MountainCar-v0.py

policyNet.pkl

This is a model that I have trained.

Actor-Critic

This is an algorithmic framework, and the classic REINFORCE method is stored under Actor-Critic.

DDPG

Episode reward in Pendulum-v0:

PPO

Original paper: https://arxiv.org/abs/1707.06347
Openai Baselines blog post: https://blog.openai.com/openai-baselines-ppo/

A2C

Advantage Policy Gradient, an paper in 2017 pointed out that the difference in performance between A2C and A3C is not obvious.

The Asynchronous Advantage Actor Critic method (A3C) has been very influential since the paper was published. The algorithm combines a few key ideas:

An updating scheme that operates on fixed-length segments of experience (say, 20 timesteps) and uses these segments to compute estimators of the returns and advantage function.
Architectures that share layers between the policy and value function.
Asynchronous updates.

A3C

Original paper: https://arxiv.org/abs/1602.01783

SAC

This is not the implementation of the author of paper!!!

Episode reward in Pendulum-v0:

TD3

This is not the implementation of the author of paper!!!

Episode reward in Pendulum-v0:

Episode reward in BipedalWalker-v2:

If you want to use the test your model:

python TD3_BipedalWalker-v2.py --mode test

Papers Related to the Deep Reinforcement Learning

[01] A Brief Survey of Deep Reinforcement Learning
[02] The Beta Policy for Continuous Control Reinforcement Learning
[03] Playing Atari with Deep Reinforcement Learning
[04] Deep Reinforcement Learning with Double Q-learning
[05] Dueling Network Architectures for Deep Reinforcement Learning
[06] Continuous control with deep reinforcement learning
[07] Continuous Deep Q-Learning with Model-based Acceleration
[08] Asynchronous Methods for Deep Reinforcement Learning
[09] Trust Region Policy Optimization
[10] Proximal Policy Optimization Algorithms
[11] Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
[12] High-Dimensional Continuous Control Using Generalized Advantage Estimation
[13] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
[14] Addressing Function Approximation Error in Actor-Critic Methods

TO DO

[x] DDPG
[x] SAC
[x] TD3

Best RL courses

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 1,345

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (16) 🔗