All Projects → hiwonjoon → tf-a3c-gpu

hiwonjoon / tf-a3c-gpu

Licence: MIT license
Tensorflow implementation of A3C algorithm

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to tf-a3c-gpu

Reinforcementlearning Atarigame
Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games
Stars: ✭ 118 (+140.82%)
Mutual labels:  openai-gym, a3c
Btgym
Scalable, event-driven, deep-learning-friendly backtesting library
Stars: ✭ 765 (+1461.22%)
Mutual labels:  openai-gym, a3c
a3c
PyTorch implementation of "Asynchronous advantage actor-critic"
Stars: ✭ 21 (-57.14%)
Mutual labels:  openai-gym, a3c
a3c-super-mario-pytorch
Reinforcement Learning for Super Mario Bros using A3C on GPU
Stars: ✭ 35 (-28.57%)
Mutual labels:  openai-gym, a3c
yarll
Combining deep learning and reinforcement learning.
Stars: ✭ 84 (+71.43%)
Mutual labels:  openai-gym, a3c
Tensorflow Rl
Implementations of deep RL papers and random experimentation
Stars: ✭ 176 (+259.18%)
Mutual labels:  openai-gym, a3c
Rl a3c pytorch
A3C LSTM Atari with Pytorch plus A3G design
Stars: ✭ 482 (+883.67%)
Mutual labels:  openai-gym, a3c
A3c continuous
A continuous action space version of A3C LSTM in pytorch plus A3G design
Stars: ✭ 223 (+355.1%)
Mutual labels:  openai-gym, a3c
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+353.06%)
Mutual labels:  openai-gym, a3c
deep rl acrobot
TensorFlow A2C to solve Acrobot, with synchronized parallel environments
Stars: ✭ 32 (-34.69%)
Mutual labels:  openai-gym, a3c
FinRL Podracer
Cloud-native Financial Reinforcement Learning
Stars: ✭ 179 (+265.31%)
Mutual labels:  openai-gym
sc2gym
PySC2 OpenAI Gym Environments
Stars: ✭ 50 (+2.04%)
Mutual labels:  openai-gym
ga-openai-gym
Usage of genetic algorithms to train a neural network in multiple OpenAI gym environments.
Stars: ✭ 24 (-51.02%)
Mutual labels:  openai-gym
Deep-Reinforcement-Learning-CS285-Pytorch
Solutions of assignments of Deep Reinforcement Learning course presented by the University of California, Berkeley (CS285) in Pytorch framework
Stars: ✭ 104 (+112.24%)
Mutual labels:  openai-gym
ddpg biped
Repository for Planar Bipedal walking robot in Gazebo environment using Deep Deterministic Policy Gradient(DDPG) using TensorFlow.
Stars: ✭ 65 (+32.65%)
Mutual labels:  openai-gym
coord-sim
Lightweight flow-level simulator for inter-node network and service coordination (e.g., in cloud/edge computing or NFV).
Stars: ✭ 33 (-32.65%)
Mutual labels:  openai-gym
modelicagym
Modelica models integration with Open AI Gym
Stars: ✭ 53 (+8.16%)
Mutual labels:  openai-gym
gym-tetris
An OpenAI Gym interface to Tetris on the NES.
Stars: ✭ 33 (-32.65%)
Mutual labels:  openai-gym
Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020
Live Trading. Please star.
Stars: ✭ 1,251 (+2453.06%)
Mutual labels:  openai-gym
vizdoomgym
OpenAI Gym wrapper for ViZDoom enviroments
Stars: ✭ 59 (+20.41%)
Mutual labels:  openai-gym

tf-a3c-gpu

Tensorflow implementation of A3C algorithm using GPU (haven't tested, but it would be also trainable with CPU).

On the original paper, "Asynchronous Methods for Deep Reinforcement Learning", suggests CPU only implementations, since environment can only be executed on CPU which causes unevitable communication overhead between CPU and GPU otherwise.

However, we can minimize communication up to 'current state, reward' instead of whole parameter sets by storing all parameters for a policy and a value network inside of a GPU. Furthermore, we can achieve more utilization of a GPU by having multiple agent for a single thread. The current implementation (and with minor tuned-hyperparameters) uses 4 threads while each has 64 agents. With this setting, I was able to achieve 2 times of speed up. (huh, a little bit disappointing, isn't it?)

Therefore, this implementation is not quietly exact re-implementation of the paper, and the effect of having multiple batch for each thread is worth to be examined. (different # of threads and agents per thread). (However, I am still curious about how A3C can achieve such a nice results. Is the asynchrnous update is the only key? I couldn't find other explanations of effectiveness of this method.) Yet, it gave me a quiet competitive result (3 hours of training on breakout-v0 for reasonable playing), so it could be a good base for someone to start with.

Enjoy :)

Requirements

  • Python 2.7
  • Tensorflow v1.2
  • OpenAI Gym v0.9
  • scipy, pip (for image resize)
  • tqdm(optional)
  • better-exceptions(optional)

Training Results

  • Training on Breakout-v0 is done with nVidia Titan X Pascal GPU for 28 hours

  • With the hyperparameter I used, one step corresponds to 64 * 5 frames of inputs(64 * 5 * average 3 game framse).

  • Orange Line: with reward clipping(reward is clipped to -1 to 1) + Gradient Normalization, Purple Line: wihtout them

    • by the number of steps

    • by the number of episodes

    • by the time

  • Check the results on my results page

    Watch the video

Training from scratch

  • All the hyperparmeters are defined on a3c.py file. Change some hyperparameters as you want, then execute it.
python ac3.py

Validation with trained models

  • If you want to see the trained agent playing, use the command:
python ac3-test.py --model ./models/breakout-v0/last.ckpt --out /tmp/result

Notes & Acknowledgement

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].