wisnunugroho21 / reinforcement_learning_ppo_rnd

Licence: GPL-3.0 license

Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch with some explanation

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to reinforcement learning ppo rnd

Explorer

Explorer is a PyTorch reinforcement learning framework for exploring new ideas.

Stars: ✭ 54 (+63.64%)

Mutual labels: deep-reinforcement-learning, gym, ppo

Pytorch A2c Ppo Acktr Gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Stars: ✭ 2,632 (+7875.76%)

Mutual labels: deep-reinforcement-learning, proximal-policy-optimization, ppo

Deep RL with pytorch

A pytorch tutorial for DRL(Deep Reinforcement Learning)

Stars: ✭ 160 (+384.85%)

Mutual labels: deep-reinforcement-learning, ppo, random-network-distillation

imitation learning

PyTorch implementation of some reinforcement learning algorithms: A2C, PPO, Behavioral Cloning from Observation (BCO), GAIL.

Stars: ✭ 93 (+181.82%)

Mutual labels: deep-reinforcement-learning, proximal-policy-optimization, ppo

Deep-Reinforcement-Learning-Notebooks

This Repository contains a series of google colab notebooks which I created to help people dive into deep reinforcement learning.This notebooks contain both theory and implementation of different algorithms.

Stars: ✭ 15 (-54.55%)

Mutual labels: deep-reinforcement-learning, ppo, cartpole-v0

Pytorch sac ae

PyTorch implementation of Soft Actor-Critic + Autoencoder(SAC+AE)

Stars: ✭ 94 (+184.85%)

Mutual labels: deep-reinforcement-learning, gym

Deeprl algorithms

DeepRL algorithms implementation easy for understanding and reading with Pytorch and Tensorflow 2(DQN, REINFORCE, VPG, A2C, TRPO, PPO, DDPG, TD3, SAC)

Stars: ✭ 97 (+193.94%)

Mutual labels: deep-reinforcement-learning, ppo

Rainy

☔ Deep RL agents with PyTorch☔

Stars: ✭ 39 (+18.18%)

Mutual labels: deep-reinforcement-learning, ppo

Deep Reinforcement Learning Algorithms

31 projects in the framework of Deep Reinforcement Learning algorithms: Q-learning, DQN, PPO, DDPG, TD3, SAC, A2C and others. Each project is provided with a detailed training log.

Stars: ✭ 167 (+406.06%)

Mutual labels: deep-reinforcement-learning, ppo

Deterministic Gail Pytorch

PyTorch implementation of Deterministic Generative Adversarial Imitation Learning (GAIL) for Off Policy learning

Stars: ✭ 44 (+33.33%)

Mutual labels: deep-reinforcement-learning, gym

Easy Rl

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+9003.03%)

Mutual labels: deep-reinforcement-learning, ppo

Machine Learning Is All You Need

🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!

Stars: ✭ 173 (+424.24%)

Mutual labels: deep-reinforcement-learning, ppo

Rlenv.directory

Explore and find reinforcement learning environments in a list of 150+ open source environments.

Stars: ✭ 79 (+139.39%)

Mutual labels: deep-reinforcement-learning, gym

Muzero General

MuZero

Stars: ✭ 1,187 (+3496.97%)

Mutual labels: deep-reinforcement-learning, gym

Deep Reinforcement Learning With Pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

Stars: ✭ 1,345 (+3975.76%)

Mutual labels: deep-reinforcement-learning, ppo

Torch Ac

Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO

Stars: ✭ 70 (+112.12%)

Mutual labels: deep-reinforcement-learning, ppo

Minimalrl

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Stars: ✭ 2,051 (+6115.15%)

Mutual labels: deep-reinforcement-learning, ppo

Naf Tensorflow

"Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow

Stars: ✭ 192 (+481.82%)

Mutual labels: deep-reinforcement-learning, gym

Deeprl

Modularized Implementation of Deep RL Algorithms in PyTorch

Stars: ✭ 2,640 (+7900%)

Mutual labels: deep-reinforcement-learning, ppo

omd

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Stars: ✭ 43 (+30.3%)

Mutual labels: deep-reinforcement-learning, gym

View All Similar Projects ➔

PPO-RND

Simple code to demonstrate Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch

Version 2 and Other Progress

Version 2 will bring improvement in code quality and peformance. I refactor the code so it will follow algorithm in PPO's implementation on OpenAI's baseline. I also using newer version of PPO called Truly PPO, which has more sample efficiency and performance than OpenAI's PPO. Currently, I am focused on how to implement this project in more difficult environment (Atari games, MuJoCo, etc).

Getting Started

This project is using Pytorch and Tensorflow 2 for Deep Learning Framework and using Gym for Reinforcement Learning Environment.
Although it's not required, but i recommend run this project on a PC with GPU and 8 GB Ram

Prerequisites

Make sure you have installed Pytorch and Gym.

Click here to install gym

You can use either Pytorch or Tensorflow 2

Click here to install pytorch
Click here to install tensorflow 2

Installing

Just clone this project into your work folder

git clone https://github.com/wisnunugroho21/reinforcement_learning_ppo_rnd.git

Running the project

After you clone the project, run following script in cmd/terminal :

Pytorch version

cd reinforcement_learning_ppo_rnd/PPO_RND/pytorch
python ppo_rnd_frozen_notslippery_pytorch.py

Tensorflow 2 version

cd reinforcement_learning_ppo_rnd/PPO_RND/'tensorflow 2'
python ppo_frozenlake_notslippery_tensorflow.py

Proximal Policy Optimization

PPO is motivated by the same question as TRPO: how can we take the biggest possible improvement step on a policy using the data we currently have, without stepping so far that we accidentally cause performance collapse? Where TRPO tries to solve this problem with a complex second-order method, PPO is a family of first-order methods that use a few other tricks to keep new policies close to old. PPO methods are significantly simpler to implement, and empirically seem to perform at least as well as TRPO.

There are two primary variants of PPO: PPO-Penalty and PPO-Clip.

PPO-Penalty approximately solves a KL-constrained update like TRPO, but penalizes the KL-divergence in the objective function instead of making it a hard constraint, and automatically adjusts the penalty coefficient over the course of training so that it’s scaled appropriately.
PPO-Clip doesn’t have a KL-divergence term in the objective and doesn’t have a constraint at all. Instead relies on specialized clipping in the objective function to remove incentives for the new policy to get far from the old policy.

OpenAI use PPO-Clip
You can read full detail of PPO in here

Random Network Distillation

Random Network Distillation (RND), a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time exceeds average human performance on Montezuma’s Revenge. RND achieves state-of-the-art performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having access to the underlying state of the game.

RND incentivizes visiting unfamiliar states by measuring how hard it is to predict the output of a fixed random neural network on visited states. In unfamiliar states it’s hard to guess the output, and hence the reward is high. It can be applied to any reinforcement learning algorithm, is simple to implement and efficient to scale.

You can read full detail of RND in here

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the likelihood ratio as it attempts to do nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Truly PPO. Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the difference between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, such that optimizing the resulted surrogate objective function provides guaranteed monotonic improvement of the ultimate policy performance. It seems, by adhering more truly to making the algorithm proximal - confining the policy within the trust region, the new algorithm improves the original PPO on both sample efficiency and performance.

You can read full detail of Truly PPO in here

Result

LunarLander using PPO (Non RND)

Result Gif	Award Progress Graph

Bipedal using PPO (Non RND)

Result Gif

Pendulum using PPO (Non RND)

Result Gif	Award Progress Graph

Pong using PPO (Non RND)

Result Gif

Contributing

This project is far from finish and will be improved anytime . Any fix, contribute, or idea would be very appreciated

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

wisnunugroho21 / reinforcement_learning_ppo_rnd

Programming Languages

Labels

Projects that are alternatives of or similar to reinforcement learning ppo rnd

PPO-RND

Version 2 and Other Progress

Getting Started

Prerequisites

Installing

Running the project

Pytorch version

Tensorflow 2 version

Proximal Policy Optimization

Random Network Distillation

Truly Proximal Policy Optimization

Result

LunarLander using PPO (Non RND)

Bipedal using PPO (Non RND)

Pendulum using PPO (Non RND)

Pong using PPO (Non RND)

Contributing