All Projects → ikostrikov → Pytorch A3c

ikostrikov / Pytorch A3c

Licence: mit
PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pytorch A3c

Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+225.71%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Pytorch Rl
Deep Reinforcement Learning with pytorch & visdom
Stars: ✭ 745 (-15.24%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (-49.72%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Torch Ac
Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO
Stars: ✭ 70 (-92.04%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Rl a3c pytorch
A3C LSTM Atari with Pytorch plus A3G design
Stars: ✭ 482 (-45.16%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Reinforcementlearning Atarigame
Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games
Stars: ✭ 118 (-86.58%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (-73.49%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (-74.74%)
Mutual labels:  deep-reinforcement-learning, a3c, actor-critic
Reinforcement Learning With Tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
Stars: ✭ 6,948 (+690.44%)
Mutual labels:  reinforcement-learning, actor-critic, a3c
Btgym
Scalable, event-driven, deep-learning-friendly backtesting library
Stars: ✭ 765 (-12.97%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, a3c
Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (-80.2%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Master-Thesis
Deep Reinforcement Learning in Autonomous Driving: the A3C algorithm used to make a car learn to drive in TORCS; Python 3.5, Tensorflow, tensorboard, numpy, gym-torcs, ubuntu, latex
Stars: ✭ 33 (-96.25%)
Mutual labels:  deep-reinforcement-learning, a3c, actor-critic
Openai lab
An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
Stars: ✭ 313 (-64.39%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Tensorflow Reinforce
Implementations of Reinforcement Learning Models in Tensorflow
Stars: ✭ 480 (-45.39%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Deeprl Tutorials
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
Stars: ✭ 748 (-14.9%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Pytorch A2c Ppo Acktr Gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Stars: ✭ 2,632 (+199.43%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Drq
DrQ: Data regularized Q
Stars: ✭ 268 (-69.51%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Slm Lab
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Stars: ✭ 904 (+2.84%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, a3c
Minimalrl
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
Stars: ✭ 2,051 (+133.33%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, a3c
Deeprl Tensorflow2
🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2
Stars: ✭ 319 (-63.71%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, a3c

pytorch-a3c

This is a PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

This implementation is inspired by Universe Starter Agent. In contrast to the starter agent, it uses an optimizer with shared statistics as in the original paper.

Please use this bibtex if you want to cite this repository in your publications:

@misc{pytorchaaac,
  author = {Kostrikov, Ilya},
  title = {PyTorch Implementations of Asynchronous Advantage Actor Critic},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ikostrikov/pytorch-a3c}},
}

A2C

I highly recommend to check a sychronous version and other algorithms: pytorch-a2c-ppo-acktr.

In my experience, A2C works better than A3C and ACKTR is better than both of them. Moreover, PPO is a great algorithm for continuous control. Thus, I recommend to try A2C/PPO/ACKTR first and use A3C only if you need it specifically for some reasons.

Also read OpenAI blog for more information.

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Usage

# Works only wih Python 3.
python3 main.py --env-name "PongDeterministic-v4" --num-processes 16

This code runs evaluation in a separate thread in addition to 16 processes.

Results

With 16 processes it converges for PongDeterministic-v4 in 15 minutes. PongDeterministic-v4

For BreakoutDeterministic-v4 it takes more than several hours.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].