Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → pat-coady → Trpo

pat-coady / Trpo

Licence: mit

Trust Region Policy Optimization with TensorFlow and OpenAI Gym

Labels

jupyter-notebook machine-learning tensorflow reinforcement-learning policy-gradient mujoco

Projects that are alternatives of or similar to Trpo

Lagom

lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.

Stars: ✭ 364 (+6.12%)

Mutual labels: jupyter-notebook, reinforcement-learning, policy-gradient, mujoco

Pytorch Rl

This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch

Stars: ✭ 394 (+14.87%)

Mutual labels: reinforcement-learning, policy-gradient, mujoco

Drq

DrQ: Data regularized Q

Stars: ✭ 268 (-21.87%)

Mutual labels: jupyter-notebook, reinforcement-learning, mujoco

Text summurization abstractive methods

Multiple implementations for abstractive text summurization , using google colab

Stars: ✭ 359 (+4.66%)

Mutual labels: jupyter-notebook, reinforcement-learning, policy-gradient

Reinforcement learning tutorial with demo

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

Stars: ✭ 442 (+28.86%)

Mutual labels: jupyter-notebook, reinforcement-learning, policy-gradient

Deep Algotrading

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Stars: ✭ 173 (-49.56%)

Mutual labels: jupyter-notebook, reinforcement-learning, policy-gradient

Hands On Reinforcement Learning With Python

Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow

Stars: ✭ 640 (+86.59%)

Mutual labels: jupyter-notebook, reinforcement-learning, policy-gradient

Pytorch Rl

Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]

Stars: ✭ 121 (-64.72%)

Mutual labels: jupyter-notebook, reinforcement-learning, policy-gradient

Rl Course Experiments

Stars: ✭ 73 (-78.72%)

Mutual labels: jupyter-notebook, reinforcement-learning, policy-gradient

Pytorch sac

PyTorch implementation of Soft Actor-Critic (SAC)

Stars: ✭ 174 (-49.27%)

Mutual labels: jupyter-notebook, reinforcement-learning, mujoco

Multihopkg

Multi-hop knowledge graph reasoning learned via policy gradient with reward shaping and action dropout

Stars: ✭ 202 (-41.11%)

Mutual labels: jupyter-notebook, reinforcement-learning, policy-gradient

Rl learn

我的强化学习笔记和学习材料📖 still updating ... ...

Stars: ✭ 234 (-31.78%)

Mutual labels: jupyter-notebook, reinforcement-learning

Aleph star

Reinforcement learning with A* and a deep heuristic

Stars: ✭ 235 (-31.49%)

Mutual labels: jupyter-notebook, reinforcement-learning

Deep-Reinforcement-Learning-CS285-Pytorch

Solutions of assignments of Deep Reinforcement Learning course presented by the University of California, Berkeley (CS285) in Pytorch framework

Stars: ✭ 104 (-69.68%)

Mutual labels: policy-gradient, mujoco

🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Stars: ✭ 5,720 (+1567.64%)

Mutual labels: jupyter-notebook, reinforcement-learning

Rad

RAD: Reinforcement Learning with Augmented Data

Stars: ✭ 268 (-21.87%)

Mutual labels: jupyter-notebook, reinforcement-learning

Popular Rl Algorithms

PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt, PointNet..

Stars: ✭ 266 (-22.45%)

Mutual labels: jupyter-notebook, reinforcement-learning

Trading Bot

Stock Trading Bot using Deep Q-Learning

Stars: ✭ 273 (-20.41%)

Mutual labels: jupyter-notebook, reinforcement-learning

Dinoruntutorial

Accompanying code for Paperspace tutorial "Build an AI to play Dino Run"

Stars: ✭ 285 (-16.91%)

Mutual labels: jupyter-notebook, reinforcement-learning

Applied Reinforcement Learning

Reinforcement Learning and Decision Making tutorials explained at an intuitive level and with Jupyter Notebooks

Stars: ✭ 229 (-33.24%)

Mutual labels: jupyter-notebook, reinforcement-learning

View All Similar Projects ➔

Trust Region Policy Optimization with Generalized Advantage Estimation

By Patrick Coady: Learning Artificial Intelligence

Summary

NOTE: The code has been refactored to use TensorFlow 2.0 and PyBullet (instead of MuJoCo). See the tf1_mujoco branch for old version.

The project's original goal was to use the same algorithm to "solve" 10 MuJoCo robotic control environments. And, specifically, to achieve this without hand-tuning the hyperparameters (network sizes, learning rates, and TRPO settings) for each environment. This is challenging because the environments range from a simple cart pole problem with a single control input to a humanoid with 17 controlled joints and 44 observed variables. The project was successful, nabbing top spots on almost all of the AI Gym MuJoCo leaderboards.

With the release of TensorFlow 2.0, I decided to dust off this project and upgrade the code. And, while I was at it, I moved from the paid MuJoCo simulator to the free PyBullet simulator.

Here are the key points:

Trust Region Policy Optimization [1] [2]
Value function approximated with 3 hidden-layer NN (tanh activations):
- hid1 size = obs_dim x 10
- hid2 size = geometric mean of hid1 and hid3 sizes
- hid3 size = 5
Policy is a multi-variate Gaussian parameterized by a 3 hidden-layer NN (tanh activations):
- hid1 size = obs_dim x 10
- hid2 size = geometric mean of hid1 and hid3 sizes
- hid3 size = action_dim x 10
- Diagonal covariance matrix variables are separately trained
Generalized Advantage Estimation (gamma = 0.995, lambda = 0.98) [3] [4]
ADAM optimizer used for both neural networks
The policy is evaluated for 20 episodes between updates, except:
- 50 episodes for Reacher
- 5 episodes for Swimmer
- 5 episodes for HalfCheetah
- 5 episodes for HumanoidStandup
Value function is trained on current batch + previous batch
KL loss factor and ADAM learning rate are dynamically adjusted during training
Policy and Value NNs built with TensorFlow

PyBullet Gym Environments

HumanoidDeepMimicBulletEnv-v1
CartPoleBulletEnv-v1
MinitaurBulletEnv-v0
MinitaurBulletDuckEnv-v0
RacecarBulletEnv-v0
RacecarZedBulletEnv-v0
KukaBulletEnv-v0
KukaCamBulletEnv-v0
InvertedPendulumBulletEnv-v0
InvertedDoublePendulumBulletEnv-v0
InvertedPendulumSwingupBulletEnv-v0
ReacherBulletEnv-v0
PusherBulletEnv-v0
ThrowerBulletEnv-v0
StrikerBulletEnv-v0
Walker2DBulletEnv-v0
HalfCheetahBulletEnv-v0
AntBulletEnv-v0
HopperBulletEnv-v0
HumanoidBulletEnv-v0
HumanoidFlagrunBulletEnv-v0
HumanoidFlagrunHarderBulletEnv-v0

Using

I ran quick checks on three of the above environments and successfully stabilized a double-inverted pendulum and taught the "half cheetah" to run.

python train.py InvertedPendulumBulletEnv-v0
python train.py InvertedDoublePendulumBulletEnv-v0 -n 5000
python train.py HalfCheetahBulletEnv-v0 -n 5000 -b 5

Videos

During training, videos are periodically saved automatically to the /tmp folder. These can be enjoyable to view, and also instructive.

Dependencies

Python 3.6
The Usual Suspects: numpy, matplotlib, scipy
TensorFlow 2.x
Open AI Gym: installation instructions
pybullet physics simulator

References

Trust Region Policy Optimization (Schulman et al., 2016)
Emergence of Locomotion Behaviours in Rich Environments (Heess et al., 2017)
High-Dimensional Continuous Control Using Generalized Advantage Estimation (Schulman et al., 2016)
GitHub Repository with several helpful implementation ideas (Schulman)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 343

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗