Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → hiwonjoon → tf-a3c-gpu

hiwonjoon / tf-a3c-gpu

Licence: MIT license

Tensorflow implementation of A3C algorithm

Programming Languages

139335 projects - #7 most used programming language

Labels

reinforcement-learning tensorflow openai-gym a3c

Projects that are alternatives of or similar to tf-a3c-gpu

Reinforcementlearning Atarigame

Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games

Stars: ✭ 118 (+140.82%)

Mutual labels: openai-gym, a3c

Scalable, event-driven, deep-learning-friendly backtesting library

Stars: ✭ 765 (+1461.22%)

Mutual labels: openai-gym, a3c

PyTorch implementation of "Asynchronous advantage actor-critic"

Stars: ✭ 21 (-57.14%)

Mutual labels: openai-gym, a3c

a3c-super-mario-pytorch

Reinforcement Learning for Super Mario Bros using A3C on GPU

Stars: ✭ 35 (-28.57%)

Mutual labels: openai-gym, a3c

Combining deep learning and reinforcement learning.

Stars: ✭ 84 (+71.43%)

Mutual labels: openai-gym, a3c

Implementations of deep RL papers and random experimentation

Stars: ✭ 176 (+259.18%)

Mutual labels: openai-gym, a3c

A3C LSTM Atari with Pytorch plus A3G design

Stars: ✭ 482 (+883.67%)

Mutual labels: openai-gym, a3c

A continuous action space version of A3C LSTM in pytorch plus A3G design

Stars: ✭ 223 (+355.1%)

Mutual labels: openai-gym, a3c

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (+353.06%)

Mutual labels: openai-gym, a3c

deep rl acrobot

TensorFlow A2C to solve Acrobot, with synchronized parallel environments

Stars: ✭ 32 (-34.69%)

Mutual labels: openai-gym, a3c

Cloud-native Financial Reinforcement Learning

Stars: ✭ 179 (+265.31%)

Mutual labels: openai-gym

PySC2 OpenAI Gym Environments

Stars: ✭ 50 (+2.04%)

Mutual labels: openai-gym

Usage of genetic algorithms to train a neural network in multiple OpenAI gym environments.

Stars: ✭ 24 (-51.02%)

Mutual labels: openai-gym

Deep-Reinforcement-Learning-CS285-Pytorch

Solutions of assignments of Deep Reinforcement Learning course presented by the University of California, Berkeley (CS285) in Pytorch framework

Stars: ✭ 104 (+112.24%)

Mutual labels: openai-gym

Repository for Planar Bipedal walking robot in Gazebo environment using Deep Deterministic Policy Gradient(DDPG) using TensorFlow.

Stars: ✭ 65 (+32.65%)

Mutual labels: openai-gym

Lightweight flow-level simulator for inter-node network and service coordination (e.g., in cloud/edge computing or NFV).

Stars: ✭ 33 (-32.65%)

Mutual labels: openai-gym

Modelica models integration with Open AI Gym

Stars: ✭ 53 (+8.16%)

Mutual labels: openai-gym

An OpenAI Gym interface to Tetris on the NES.

Stars: ✭ 33 (-32.65%)

Mutual labels: openai-gym

Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020

Live Trading. Please star.

Stars: ✭ 1,251 (+2453.06%)

Mutual labels: openai-gym

OpenAI Gym wrapper for ViZDoom enviroments

Stars: ✭ 59 (+20.41%)

Mutual labels: openai-gym

View All Similar Projects ➔

tf-a3c-gpu

Tensorflow implementation of A3C algorithm using GPU (haven't tested, but it would be also trainable with CPU).

On the original paper, "Asynchronous Methods for Deep Reinforcement Learning", suggests CPU only implementations, since environment can only be executed on CPU which causes unevitable communication overhead between CPU and GPU otherwise.

However, we can minimize communication up to 'current state, reward' instead of whole parameter sets by storing all parameters for a policy and a value network inside of a GPU. Furthermore, we can achieve more utilization of a GPU by having multiple agent for a single thread. The current implementation (and with minor tuned-hyperparameters) uses 4 threads while each has 64 agents. With this setting, I was able to achieve 2 times of speed up. (huh, a little bit disappointing, isn't it?)

Therefore, this implementation is not quietly exact re-implementation of the paper, and the effect of having multiple batch for each thread is worth to be examined. (different # of threads and agents per thread). (However, I am still curious about how A3C can achieve such a nice results. Is the asynchrnous update is the only key? I couldn't find other explanations of effectiveness of this method.) Yet, it gave me a quiet competitive result (3 hours of training on breakout-v0 for reasonable playing), so it could be a good base for someone to start with.

Enjoy :)

Requirements

Python 2.7
Tensorflow v1.2
OpenAI Gym v0.9
scipy, pip (for image resize)
tqdm(optional)
better-exceptions(optional)

Training Results

Training on Breakout-v0 is done with nVidia Titan X Pascal GPU for 28 hours
With the hyperparameter I used, one step corresponds to 64 * 5 frames of inputs(64 * 5 * average 3 game framse).
Orange Line: with reward clipping(reward is clipped to -1 to 1) + Gradient Normalization, Purple Line: wihtout them
- by the number of steps
- by the number of episodes
- by the time
Check the results on my results page

Training from scratch

All the hyperparmeters are defined on a3c.py file. Change some hyperparameters as you want, then execute it.

python ac3.py

Validation with trained models

If you want to see the trained agent playing, use the command:

python ac3-test.py --model ./models/breakout-v0/last.ckpt --out /tmp/result

Notes & Acknowledgement

Here is other implementations and code I refer to.
- ppwwyyxx's implementation
- carpedm20's implementation of DQN

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 49

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗