All Projects → MG2033 → A2c

MG2033 / A2c

Licence: apache-2.0
A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to A2c

Pytorch sac ae
PyTorch implementation of Soft Actor-Critic + Autoencoder(SAC+AE)
Stars: ✭ 94 (-44.38%)
Mutual labels:  gym, reinforcement-learning, actor-critic
Deep Rl Keras
Keras Implementation of popular Deep RL Algorithms (A3C, DDQN, DDPG, Dueling DDQN)
Stars: ✭ 395 (+133.73%)
Mutual labels:  gym, reinforcement-learning, policy-gradient
Openai lab
An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
Stars: ✭ 313 (+85.21%)
Mutual labels:  reinforcement-learning, policy-gradient, actor-critic
Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (+2.96%)
Mutual labels:  gym, reinforcement-learning, actor-critic
Pytorch Rl
Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]
Stars: ✭ 121 (-28.4%)
Mutual labels:  reinforcement-learning, policy-gradient, actor-critic
Drq
DrQ: Data regularized Q
Stars: ✭ 268 (+58.58%)
Mutual labels:  gym, reinforcement-learning, actor-critic
Pytorch Rl
This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch
Stars: ✭ 394 (+133.14%)
Mutual labels:  gym, reinforcement-learning, policy-gradient
Explorer
Explorer is a PyTorch reinforcement learning framework for exploring new ideas.
Stars: ✭ 54 (-68.05%)
Mutual labels:  gym, policy-gradient, actor-critic
Mlds2018spring
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
Stars: ✭ 124 (-26.63%)
Mutual labels:  reinforcement-learning, policy-gradient, actor-critic
Tensorflow Reinforce
Implementations of Reinforcement Learning Models in Tensorflow
Stars: ✭ 480 (+184.02%)
Mutual labels:  reinforcement-learning, policy-gradient, actor-critic
Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+1594.08%)
Mutual labels:  reinforcement-learning, policy-gradient, actor-critic
Reinforcement Learning With Tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
Stars: ✭ 6,948 (+4011.24%)
Mutual labels:  reinforcement-learning, policy-gradient, actor-critic
Rl algorithms
Structural implementation of RL key algorithms
Stars: ✭ 352 (+108.28%)
Mutual labels:  gym, reinforcement-learning, policy-gradient
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (+161.54%)
Mutual labels:  reinforcement-learning, policy-gradient, actor-critic
Rlseq2seq
Deep Reinforcement Learning For Sequence to Sequence Models
Stars: ✭ 683 (+304.14%)
Mutual labels:  reinforcement-learning, policy-gradient, actor-critic
Rl algos
Reinforcement Learning Algorithms
Stars: ✭ 14 (-91.72%)
Mutual labels:  gym, reinforcement-learning, actor-critic
Stable Baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Stars: ✭ 1,263 (+647.34%)
Mutual labels:  gym, reinforcement-learning
Reinforcement learning
Reinforcement learning tutorials
Stars: ✭ 82 (-51.48%)
Mutual labels:  reinforcement-learning, policy-gradient
Torchrl
Pytorch Implementation of Reinforcement Learning Algorithms ( Soft Actor Critic(SAC)/ DDPG / TD3 /DQN / A2C/ PPO / TRPO)
Stars: ✭ 90 (-46.75%)
Mutual labels:  gym, reinforcement-learning
Run Skeleton Run
Reason8.ai PyTorch solution for NIPS RL 2017 challenge
Stars: ✭ 83 (-50.89%)
Mutual labels:  reinforcement-learning, actor-critic

A2C

An implementation of Synchronous Advantage Actor Critic (A2C) in TensorFlow. A2C is a variant of advantage actor critic introduced by OpenAI in their published baselines. However, these baselines are difficult to understand and modify. So, I made the A2C based on their implementation but in a clearer and simpler way.

What's new to OpenAI Baseline?

  1. Support for Tensorboard visualization per running agent in an environment.
  2. Support for different policy networks in an easier way.
  3. Support for environments other than OpenAI gym in an easy way.
  4. Support for video generation of an agent acting in the environment.
  5. Simple and easy code to modify and begin experimenting. All you need to do is plug and play!

Asynchronous vs Synchronous Advantage Actor Critic

Asynchronous advantage actor critic was introduced in Asynchronous Methods for Deep Reinforcement Learning. The difference between both methods is that in asynchronous AC, parallel agents update the global network each one on its own. So, at a certain time, the weights used by an agent maybe different than the weights used by another agent leading to the fact that each agent plays with a different policy to explore more and more of the environment. However, in synchronous AC, all of the updates by the parallel agents are collected to update the global network. To encourage exploration, stochastic noise is added to the probability distribution of the actions predicted by each agent.



Environments Supported

This implementation allows for using different environments. It's not restricted to OpenAI gym environments. If you want to attach the project to another environment rather than that provided by gym, all you have to do is to inherit from the base class BaseEnv in envs/base_env.py, and implement all the methods in a plug and play fashion (See the gym environment example class). You also have to add the name of the new environment class in A2C.py\env_name_parser() method.

The methods that should be implemented in a new environment class are:

  1. make() for creating the environment and returning a reference to it.
  2. step() for taking a step in the environment and returning a tuple (observation images, reward float value, done boolean, any other info).
  3. reset() for resetting the environment to the initial state.
  4. get_observation_space() for returning an object with attribute tuple shape representing the shape of the observation space.
  5. get_action_space() for returning an object with attribute n representing the number of possible actions in the environment.
  6. render() for rendering the environment if appropriate.

Policy Networks Supported

This implementation comes with the basic CNN policy network from OpenAI baseline. However, it supports using different policy networks. All you have to do is to inherit from the base class BasePolicy in models\base_policy.py, and implement all the methods in a plug and play fashion again :D (See the CNNPolicy example class). You also have to add the name of the new policy network class in models\model.py\policy_name_parser() method.

Tensorboard Visualization

This implementation allows for the beautiful Tensorboard visualization. It displays the time plots per running agent of the two most important signals in reinforcement learning: episode length and total reward in the episode. All you have to do is to launch Tensorboard from your experiment directory located in experiments/.

tensorboard --logdir=experiments/my_experiment/summaries


Video Generation

During training, you can generate videos of the trained agent acting (playing) in the environment. This is achieved by changing record_video_every in the configuration file from -1 to the number of episodes between two generated videos. Videos are generated in your experiment directory.

During testing, videos are generated automatically if the optional monitor method is implemented in the environment. As for the gym included environment, it's already been implemented.

Usage

Main Dependencies

Python 3 or above
tensorflow 1.3.0
numpy 1.13.1
gym 0.9.2
tqdm 4.15.0
bunch 1.0.1
matplotlib 2.0.2
Pillow 4.2.1

Run

python main.py config/test.json

The file 'test.json' is just an example of a file having all parameters to train on environments. You can create your own configuration file for training/testing.

In the project, two configuration files are provided as examples for training on Pong and Breakout Atari games.

Results

Model Game Average Score Max Score
CNNPolicy Pong 17 21
CNNPolicy Breakout 650 850

Updates

  • Inference and training are working properly.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Reference Repository

OpenAI Baselines

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].