Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → lcswillems → Rl Starter Files

lcswillems / Rl Starter Files

Licence: mit

RL starter files in order to immediatly train, visualize and evaluate an agent without writing any line of code

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch ppo a3c

Projects that are alternatives of or similar to Rl Starter Files

Slm Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

Stars: ✭ 904 (+178.15%)

Mutual labels: ppo, a3c

Deeprl Tensorflow2

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

Stars: ✭ 319 (-1.85%)

Mutual labels: ppo, a3c

Reinforcement Learning With Tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Stars: ✭ 6,948 (+2037.85%)

Mutual labels: ppo, a3c

Reinforcement learning

Reinforcement learning tutorials

Stars: ✭ 82 (-74.77%)

Mutual labels: ppo, a3c

Machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Stars: ✭ 145 (-55.38%)

Mutual labels: ppo, a3c

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (-31.69%)

Mutual labels: a3c, ppo

Torch Ac

Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO

Stars: ✭ 70 (-78.46%)

Mutual labels: ppo, a3c

Deep Reinforcement Learning With Pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

Stars: ✭ 1,345 (+313.85%)

Mutual labels: ppo, a3c

Easy Rl

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+824.31%)

Mutual labels: ppo, a3c

Minimalrl

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Stars: ✭ 2,051 (+531.08%)

Mutual labels: ppo, a3c

Deep-Reinforcement-Learning-Notebooks

This Repository contains a series of google colab notebooks which I created to help people dive into deep reinforcement learning.This notebooks contain both theory and implementation of different algorithms.

Stars: ✭ 15 (-95.38%)

Mutual labels: a3c, ppo

pysc2-rl-agents

StarCraft II / PySC2 Deep Reinforcement Learning Agents (A2C)

Stars: ✭ 124 (-61.85%)

Mutual labels: a3c

ReinforcementLearningZoo.jl

juliareinforcementlearning.org/

Stars: ✭ 46 (-85.85%)

Mutual labels: ppo

tf-a3c-gpu

Tensorflow implementation of A3C algorithm

Stars: ✭ 49 (-84.92%)

Mutual labels: a3c

gym-microrts-paper-sb3

RL agent to play μRTS with Stable-Baselines3 and PyTorch

Stars: ✭ 21 (-93.54%)

Mutual labels: ppo

Deep reinforcement learning course

Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch

Stars: ✭ 3,232 (+894.46%)

Mutual labels: ppo

ppo-pytorch

Proximal Policy Optimization(PPO) with Intrinsic Curiosity Module(ICM)

Stars: ✭ 83 (-74.46%)

Mutual labels: ppo

ElegantRL

Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥

Stars: ✭ 2,074 (+538.15%)

Mutual labels: ppo

model-free-algorithms

TD3, SAC, IQN, Rainbow, PPO, Ape-X and etc. in TF1.x

Stars: ✭ 56 (-82.77%)

Mutual labels: ppo

td-reg

TD-Regularized Actor-Critic Methods

Stars: ✭ 28 (-91.38%)

Mutual labels: ppo

View All Similar Projects ➔

RL Starter Files

RL starter files in order to immediatly train, visualize and evaluate an agent without writing any line of code.

These files are suited for gym-minigrid environments and torch-ac RL algorithms. They are easy to adapt to other environments and RL algorithms.

Features

Script to train, including:
- Log in txt, CSV and Tensorboard
- Save model
- Stop and restart training
- Use A2C or PPO algorithms
Script to visualize, including:
- Act by sampling or argmax
- Save as Gif
Script to evaluate, including:
- Act by sampling or argmax
- List the worst performed episodes

Installation

Clone this repository.
Install gym-minigrid environments and torch-ac RL algorithms:

pip3 install -r requirements.txt

Note: If you want to modify torch-ac algorithms, you will need to rather install a cloned version, i.e.:

git clone https://github.com/lcswillems/torch-ac.git
cd torch-ac
pip3 install -e .

Example of use

Train, visualize and evaluate an agent on the MiniGrid-DoorKey-5x5-v0 environment:

Train the agent on the MiniGrid-DoorKey-5x5-v0 environment with PPO algorithm:

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

Visualize agent's behavior:

python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

Evaluate agent's performance:

python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

Note: More details on the commands are given below.

Other examples

Handle textual instructions

In the GoToDoor environment, the agent receives an image along with a textual instruction. To handle the latter, add --text to the command:

python3 -m scripts.train --algo ppo --env MiniGrid-GoToDoor-5x5-v0 --model GoToDoor --text --save-interval 10 --frames 1000000

Add memory

In the RedBlueDoors environment, the agent has to open the red door then the blue one. To solve it efficiently, when it opens the red door, it has to remember it. To add memory to the agent, add --recurrence X to the command:

python3 -m scripts.train --algo ppo --env MiniGrid-RedBlueDoors-6x6-v0 --model RedBlueDoors --recurrence 4 --save-interval 10 --frames 1000000

Files

This package contains:

scripts to:
- train an agent
  in script/train.py (more details)
- visualize agent's behavior
  in script/visualize.py (more details)
- evaluate agent's performances
  in script/evaluate.py (more details)
a default agent's model
in model.py (more details)
utilitarian classes and functions used by the scripts
in utils

These files are suited for gym-minigrid environments and torch-ac RL algorithms. They are easy to adapt to other environments and RL algorithms by modifying:

model.py
utils/format.py

scripts/train.py

An example of use:

python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

The script loads the model in storage/DoorKey or creates it if it doesn't exist, then trains it with the PPO algorithm on the MiniGrid DoorKey environment, and saves it every 10 updates in storage/DoorKey. It stops after 80 000 frames.

Note: You can define a different storage location in the environment variable PROJECT_STORAGE.

More generally, the script has 2 required arguments:

--algo ALGO: name of the RL algorithm used to train
--env ENV: name of the environment to train on

and a bunch of optional arguments among which:

--recurrence N: gradient will be backpropagated over N timesteps. By default, N = 1. If N > 1, a LSTM is added to the model to have memory.
--text: a GRU is added to the model to handle text input.
... (see more using --help)

During training, logs are printed in your terminal (and saved in text and CSV format):

Note: U gives the update number, F the total number of frames, FPS the number of frames per second, D the total duration, rR:μσmM the mean, std, min and max reshaped return per episode, F:μσmM the mean, std, min and max number of frames per episode, H the entropy, V the value, pL the policy loss, vL the value loss and ∇ the gradient norm.

During training, logs are also plotted in Tensorboard:

scripts/visualize.py

An example of use:

python3 -m scripts.visualize --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

In this use case, the script displays how the model in storage/DoorKey behaves on the MiniGrid DoorKey environment.

More generally, the script has 2 required arguments:

--env ENV: name of the environment to act on.
--model MODEL: name of the trained model.

and a bunch of optional arguments among which:

--argmax: select the action with highest probability
... (see more using --help)

scripts/evaluate.py

An example of use:

python3 -m scripts.evaluate --env MiniGrid-DoorKey-5x5-v0 --model DoorKey

In this use case, the script prints in the terminal the performance among 100 episodes of the model in storage/DoorKey.

More generally, the script has 2 required arguments:

--env ENV: name of the environment to act on.
--model MODEL: name of the trained model.

and a bunch of optional arguments among which:

--episodes N: number of episodes of evaluation. By default, N = 100.
... (see more using --help)

model.py

The default model is discribed by the following schema:

By default, the memory part (in red) and the langage part (in blue) are disabled. They can be enabled by setting to True the use_memory and use_text parameters of the model constructor.

This model can be easily adapted to your needs.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 325

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗