All Projects → lcswillems → Torch Ac

lcswillems / Torch Ac

Licence: mit
Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Torch Ac

Reinforcementlearning Atarigame
Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games
Stars: ✭ 118 (+68.57%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Pytorch A3c
PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
Stars: ✭ 879 (+1155.71%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Minimalrl
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
Stars: ✭ 2,051 (+2830%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo, a3c
Rl a3c pytorch
A3C LSTM Atari with Pytorch plus A3G design
Stars: ✭ 482 (+588.57%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Deeprl Tutorials
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
Stars: ✭ 748 (+968.57%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo, actor-critic
Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+4191.43%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo, a3c
Pytorch Rl
Deep Reinforcement Learning with pytorch & visdom
Stars: ✭ 745 (+964.29%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Deep Reinforcement Learning With Pytorch
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Stars: ✭ 1,345 (+1821.43%)
Mutual labels:  deep-reinforcement-learning, ppo, actor-critic, a3c
Reinforcement Learning With Tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
Stars: ✭ 6,948 (+9825.71%)
Mutual labels:  reinforcement-learning, ppo, actor-critic, a3c
Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+3990%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (+531.43%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic, a3c
Slm Lab
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Stars: ✭ 904 (+1191.43%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo, a3c
Pytorch A2c Ppo Acktr Gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Stars: ✭ 2,632 (+3660%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo, actor-critic
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (+232.86%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo, actor-critic
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+217.14%)
Mutual labels:  deep-reinforcement-learning, a3c, actor-critic, ppo
Deeprl Tensorflow2
🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2
Stars: ✭ 319 (+355.71%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo, a3c
Tensorflow Reinforce
Implementations of Reinforcement Learning Models in Tensorflow
Stars: ✭ 480 (+585.71%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic
Elegantrl
Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch.
Stars: ✭ 575 (+721.43%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo
Autonomous Learning Library
A PyTorch library for building deep reinforcement learning agents.
Stars: ✭ 425 (+507.14%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, ppo
Dissecting Reinforcement Learning
Python code, PDFs and resources for the series of posts on Reinforcement Learning which I published on my personal blog
Stars: ✭ 512 (+631.43%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, actor-critic

PyTorch Actor-Critic deep reinforcement learning algorithms: A2C and PPO

The torch_ac package contains the PyTorch implementation of two Actor-Critic deep reinforcement learning algorithms:

Note: An example of use of this package is given in the rl-starter-files repository. More details below.

Features

  • Recurrent policies
  • Reward shaping
  • Handle observation spaces that are tensors or dict of tensors
  • Handle discrete action spaces
  • Observation preprocessing
  • Multiprocessing
  • CUDA

Installation

pip3 install torch-ac

Note: If you want to modify torch-ac algorithms, you will need to rather install a cloned version, i.e.:

git clone https://github.com/lcswillems/torch-ac.git
cd torch-ac
pip3 install -e .

Package components overview

A brief overview of the components of the package:

  • torch_ac.A2CAlgo and torch_ac.PPOAlgo classes for A2C and PPO algorithms
  • torch_ac.ACModel and torch_ac.RecurrentACModel abstract classes for non-recurrent and recurrent actor-critic models
  • torch_ac.DictList class for making dictionnaries of lists list-indexable and hence batch-friendly

Package components details

Here are detailled the most important components of the package.

torch_ac.A2CAlgo and torch_ac.PPOAlgo have 2 methods:

  • __init__ that may take, among the other parameters:
    • an acmodel actor-critic model, i.e. an instance of a class inheriting from either torch_ac.ACModel or torch_ac.RecurrentACModel.
    • a preprocess_obss function that transforms a list of observations into a list-indexable object X (e.g. a PyTorch tensor). The default preprocess_obss function converts observations into a PyTorch tensor.
    • a reshape_reward function that takes into parameter an observation obs, the action action taken, the reward reward received and the terminal status done and returns a new reward. By default, the reward is not reshaped.
    • a recurrence number to specify over how many timesteps gradient is backpropagated. This number is only taken into account if a recurrent model is used and must divide the num_frames_per_agent parameter and, for PPO, the batch_size parameter.
  • update_parameters that first collects experiences, then update the parameters and finally returns logs.

torch_ac.ACModel has 2 abstract methods:

  • __init__ that takes into parameter an observation_space and an action_space.
  • forward that takes into parameter N preprocessed observations obs and returns a PyTorch distribution dist and a tensor of values value. The tensor of values must be of size N, not N x 1.

torch_ac.RecurrentACModel has 3 abstract methods:

  • __init__ that takes into parameter the same parameters than torch_ac.ACModel.
  • forward that takes into parameter the same parameters than torch_ac.ACModel along with a tensor of N memories memory of size N x M where M is the size of a memory. It returns the same thing than torch_ac.ACModel plus a tensor of N memories memory.
  • memory_size that returns the size M of a memory.

Note: The preprocess_obss function must return a list-indexable object (e.g. a PyTorch tensor). If your observations are dictionnaries, your preprocess_obss function may first convert a list of dictionnaries into a dictionnary of lists and then make it list-indexable using the torch_ac.DictList class as follow:

>>> d = DictList({"a": [[1, 2], [3, 4]], "b": [[5], [6]]})
>>> d.a
[[1, 2], [3, 4]]
>>> d[0]
DictList({"a": [1, 2], "b": [5]})

Note: if you use a RNN, you will need to set batch_first to True.

Examples

Examples of use of the package components are given in the rl-starter-scripts repository.

Example of use of torch_ac.A2CAlgo and torch_ac.PPOAlgo

...

algo = torch_ac.PPOAlgo(envs, acmodel, args.frames_per_proc, args.discount, args.lr, args.gae_lambda,
                        args.entropy_coef, args.value_loss_coef, args.max_grad_norm, args.recurrence,
                        args.optim_eps, args.clip_eps, args.epochs, args.batch_size, preprocess_obss)

...

exps, logs1 = algo.collect_experiences()
logs2 = algo.update_parameters(exps)

More details here.

Example of use of torch_ac.DictList

torch_ac.DictList({
    "image": preprocess_images([obs["image"] for obs in obss], device=device),
    "text": preprocess_texts([obs["mission"] for obs in obss], vocab, device=device)
})

More details here.

Example of implementation of torch_ac.RecurrentACModel

class ACModel(nn.Module, torch_ac.RecurrentACModel):
    ...

    def forward(self, obs, memory):
        ...

        return dist, value, memory

More details here.

Examples of preprocess_obss functions

More details here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].