Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → dgriff777 → Rl_a3c_pytorch

dgriff777 / Rl_a3c_pytorch

Licence: apache-2.0

A3C LSTM Atari with Pytorch plus A3G design

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch reinforcement-learning deep-reinforcement-learning openai-gym actor-critic a3c

Projects that are alternatives of or similar to Rl a3c pytorch

Reinforcementlearning Atarigame

Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games

Stars: ✭ 118 (-75.52%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, openai-gym, actor-critic, a3c

Reinforcement learning tutorial with demo

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

Stars: ✭ 442 (-8.3%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c

Reinforcement Learning

Minimal and Clean Reinforcement Learning Examples

Stars: ✭ 2,863 (+493.98%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (-53.94%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c, actor-critic

Pytorch Rl

Deep Reinforcement Learning with pytorch & visdom

Stars: ✭ 745 (+54.56%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c

Btgym

Scalable, event-driven, deep-learning-friendly backtesting library

Stars: ✭ 765 (+58.71%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, openai-gym, a3c

Pytorch A3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

Stars: ✭ 879 (+82.37%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c

Torch Ac

Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO

Stars: ✭ 70 (-85.48%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c

Hierarchical Actor Critic Hac Pytorch

PyTorch implementation of Hierarchical Actor Critic (HAC) for OpenAI gym environments

Stars: ✭ 116 (-75.93%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, openai-gym, actor-critic

Mushroom Rl

Python library for Reinforcement Learning.

Stars: ✭ 442 (-8.3%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, openai-gym

Tensorflow Rl

Implementations of deep RL papers and random experimentation

Stars: ✭ 176 (-63.49%)

Mutual labels: reinforcement-learning, openai-gym, a3c

yarll

Combining deep learning and reinforcement learning.

Stars: ✭ 84 (-82.57%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c

Pytorch Drl

PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.

Stars: ✭ 233 (-51.66%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic

Pytorch A2c Ppo Acktr Gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Stars: ✭ 2,632 (+446.06%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic

a3c-super-mario-pytorch

Reinforcement Learning for Super Mario Bros using A3C on GPU

Stars: ✭ 35 (-92.74%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c

Deep Reinforcement Learning

Repo for the Deep Reinforcement Learning Nanodegree program

Stars: ✭ 4,012 (+732.37%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, openai-gym

Tensorflow Reinforce

Implementations of Reinforcement Learning Models in Tensorflow

Stars: ✭ 480 (-0.41%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic

Pytorch sac

PyTorch implementation of Soft Actor-Critic (SAC)

Stars: ✭ 174 (-63.9%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic

Drq

DrQ: Data regularized Q

Stars: ✭ 268 (-44.4%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, actor-critic

Master-Thesis

Deep Reinforcement Learning in Autonomous Driving: the A3C algorithm used to make a car learn to drive in TORCS; Python 3.5, Tensorflow, tensorboard, numpy, gym-torcs, ubuntu, latex

Stars: ✭ 33 (-93.15%)

Mutual labels: deep-reinforcement-learning, a3c, actor-critic

View All Similar Projects ➔

NEWLY ADDED A3G A NEW GPU/CPU ARCHITECTURE OF A3C FOR SUBSTANTIALLY ACCELERATED TRAINING!!

RL A3C Pytorch

NEWLY ADDED A3G!!

New implementation of A3C that utilizes GPU for speed increase in training. Which we can call A3G. A3G as opposed to other versions that try to utilize GPU with A3C algorithm, with A3G each agent has its own network maintained on GPU but shared model is on CPU and agent models are quickly converted to CPU to update shared model which allows updates to be frequent and fast by utilizing Hogwild Training and make updates to shared model asynchronously and without locks. This new method greatly increase training speed and models that use to take days to train can be trained in as fast as 10minutes for some Atari games! 10-15minutes for Breakout to start to score over 400! And 10mins to solve Pong!

This repository includes my implementation with reinforcement learning using Asynchronous Advantage Actor-Critic (A3C) in Pytorch an algorithm from Google Deep Mind's paper "Asynchronous Methods for Deep Reinforcement Learning."

See a3c_continuous a newly added repo of my A3C LSTM implementation for continuous action spaces which was able to solve BipedWalkerHardcore-v2 environment (average 300+ for 100 consecutive episodes)

A3C LSTM

I implemented an A3C LSTM model and trained it in the atari 2600 environments provided in the Openai Gym. So far model currently has shown the best prerfomance I have seen for atari game environments. Included in repo are trained models for SpaceInvaders-v0, MsPacman-v0, Breakout-v0, BeamRider-v0, Pong-v0, Seaquest-v0 and Asteroids-v0 which have had very good performance and currently hold the best scores on openai gym leaderboard for each of those games(No plans on training model for any more atari games right now...). Saved models in trained_models folder. *Removed trained models to reduce the size of repo

Have optimizers using shared statistics for RMSProp and Adam available for use in training as well option to use non shared optimizer.

Gym atari settings are more difficult to train than traditional ALE atari settings as Gym uses stochastic frame skipping and has higher number of discrete actions. Such as Breakout-v0 has 6 discrete actions in Gym but ALE is set to only 4 discrete actions. Also in GYM atari they randomly repeat the previous action with probability 0.25 and there is time/step limit that limits performance.

link to the Gym environment evaluations below

Tables	Best 100 episode Avg	Best Score
SpaceInvaders-v0	5808.45 ± 337.28	13380.0
SpaceInvaders-v3	6944.85 ± 409.60	20440.0
SpaceInvadersDeterministic-v3	79060.10 ± 5826.59	167330.0
Breakout-v0	739.30 ± 18.43	864.0
Breakout-v3	859.57 ± 1.97	864.0
Pong-v0	20.96 ± 0.02	21.0
PongDeterministic-v3	21.00 ± 0.00	21.0
BeamRider-v0	8441.22 ± 221.24	13130.0
MsPacman-v0	6323.01 ± 116.91	10181.0
Seaquest-v0	54203.50 ± 1509.85	88840.0

The 167,330 Space Invaders score is World Record Space Invaders score and game ended only due to GYM timestep limit and not from loss of life. When I increased the GYM timestep limit to a million its reached a score on Space Invaders of approximately 2,300,000 and still ended due to timestep limit. Most likely due to game getting fairly redundent after a while

Due to gym version Seaquest-v0 timestep limit agent scores lower but on Seaquest-v4 with higher timestep limit agent beats game (see gif above) with max possible score 999,999!!

Requirements

Python 2.7+
Openai Gym and Universe
Pytorch

Training

When training model it is important to limit number of worker processes to number of cpu cores available as too many processes (e.g. more than one process per cpu core available) will actually be detrimental in training speed and effectiveness

To train agent in Pong-v0 environment with 32 different worker processes:

python main.py --env Pong-v0 --workers 32

#A3C-GPU training using machine with 4 V100 GPUs and 20core CPU for PongDeterministic-v4 took 10 minutes to converge

To train agent in PongDeterministic-v4 environment with 32 different worker processes on 4 GPUs with new A3G:

python main.py --env PongDeterministic-v4 --workers 32 --gpu-ids 0 1 2 3 --amsgrad True

Hit Ctrl C to end training session properly

Evaluation

To run a 100 episode gym evaluation with trained model

python gym_eval.py --env Pong-v0 --num-episodes 100

Notice BeamRiderNoFrameskip-v4 reaches scores over 50,000 in less than 2hrs of training compared to the gym v0 version this shows the difficulty of those versions but also the timelimit being a major factor in score level

These training charts were done on a DGX Station using 4GPUs and 20core Cpu. I used 36 worker agents and a tau of 0.92 which is the lambda in Generalized Advantage Estimation equation to introduce more variance due to the more deterministic nature of using just a 4 frame skip environment and a 0-30 NoOp start

Project Reference

https://github.com/ikostrikov/pytorch-a3c

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 482

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗