Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → vwxyzjn → Cleanrl

vwxyzjn / Cleanrl

Licence: mit

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning pytorch reinforcement-learning gym

Projects that are alternatives of or similar to Cleanrl

A simple interface to instantiate Reinforcement Learning environments with SUMO for Traffic Signal Control. Compatible with Gym Env from OpenAI and MultiAgentEnv from RLlib.

Stars: ✭ 145 (-58.45%)

Mutual labels: gym, reinforcement-learning

AI research environment for the Atari 2600 games 🤖.

Stars: ✭ 174 (-50.14%)

Mutual labels: gym, reinforcement-learning

Rl Baselines3 Zoo

A collection of pre-trained RL agents using Stable Baselines3, training and hyperparameter optimization included.

Stars: ✭ 161 (-53.87%)

Mutual labels: gym, reinforcement-learning

Pytorch Implementation of Reinforcement Learning Algorithms ( Soft Actor Critic(SAC)/ DDPG / TD3 /DQN / A2C/ PPO / TRPO)

Stars: ✭ 90 (-74.21%)

Mutual labels: gym, reinforcement-learning

Pytorch Reinforce

PyTorch Implementation of REINFORCE for both discrete & continuous control

Stars: ✭ 212 (-39.26%)

Mutual labels: gym, reinforcement-learning

PyTorch implementation of Soft Actor-Critic + Autoencoder(SAC+AE)

Stars: ✭ 94 (-73.07%)

Mutual labels: gym, reinforcement-learning

PyTorch implementation of Soft Actor-Critic (SAC)

Stars: ✭ 174 (-50.14%)

Mutual labels: gym, reinforcement-learning

OpenAI Gym wrapper for the DeepMind Control Suite

Stars: ✭ 75 (-78.51%)

Mutual labels: gym, reinforcement-learning

Unreal environments for reinforcement learning

Stars: ✭ 202 (-42.12%)

Mutual labels: gym, reinforcement-learning

"Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow

Stars: ✭ 192 (-44.99%)

Mutual labels: gym, reinforcement-learning

Stable Baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Stars: ✭ 1,263 (+261.89%)

Mutual labels: gym, reinforcement-learning

gym-gazebo2 is a toolkit for developing and comparing reinforcement learning algorithms using ROS 2 and Gazebo

Stars: ✭ 257 (-26.36%)

Mutual labels: gym, reinforcement-learning

An OpenAI-gym-like environment for Little Fighter 2

Stars: ✭ 79 (-77.36%)

Mutual labels: gym, reinforcement-learning

Stable Baselines

Mirror of Stable-Baselines: a fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stars: ✭ 115 (-67.05%)

Mutual labels: gym, reinforcement-learning

Rlenv.directory

Explore and find reinforcement learning environments in a list of 150+ open source environments.

Stars: ✭ 79 (-77.36%)

Mutual labels: gym, reinforcement-learning

A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow

Stars: ✭ 169 (-51.58%)

Mutual labels: gym, reinforcement-learning

A Trading environment base on Gym

Stars: ✭ 71 (-79.66%)

Mutual labels: gym, reinforcement-learning

MuZero

Stars: ✭ 1,187 (+240.11%)

Mutual labels: gym, reinforcement-learning

Sokoban environment for OpenAI Gym

Stars: ✭ 186 (-46.7%)

Mutual labels: gym, reinforcement-learning

A collection of multi agent environments based on OpenAI gym.

Stars: ✭ 226 (-35.24%)

Mutual labels: gym, reinforcement-learning

View All Similar Projects ➔

CleanRL (Clean Implementation of RL Algorithms)

CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments at scale using AWS Batch. The highlight features of CleanRL are:

📜 Single-file implementation
- Every detail about an algorithm is put into the algorithm's own file. It is therefore easier to fully understand an algortihm and do research with.
📊 Benchmarked Implementation (7+ algorithms and 34+ games at https://benchmark.cleanrl.dev)
📈 Tensorboard Logging
🪛 Local Reproducibility via Seeding
🎮 Videos of Gameplay Capturing
🧫 Experiment Management with Weights and Biases
💸 Cloud Integration with docker and AWS

Good luck have fun 🚀

Algorithms Implemented

[x] Deep Q-Learning (DQN)
- dqn.py
  - For discrete action space.
- dqn_atari.py
  - For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- dqn_atari_visual.py
  - Adds q-values visulization for dqn_atari.py.
[x] Categorical DQN (C51)
- c51.py
  - For discrete action space.
- c51_atari.py
  - For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- c51_atari_visual.py
  - Adds return and q-values visulization for dqn_atari.py.
[x] Proximal Policy Gradient (PPO)
- All of the PPO implementations below are augmented with some code-level optimizations. See https://costa.sh/blog-the-32-implementation-details-of-ppo.html for more details
- ppo.py
  - For discrete action space.
- ppo_continuous_action.py
  - For continuous action space. Also implemented Mujoco-specific code-level optimizations
- ppo_atari.py
  - For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- ppo_atari_visual.py
  - Adds action probability visulization for ppo_atari.py.
- experiments/ppo_self_play.py
  - Implements a self-play agent for https://github.com/hardmaru/slimevolleygym
- experiments/ppo_microrts.py
  - Implements invalid action masking and handling of MultiDiscrete action space for https://github.com/vwxyzjn/gym-microrts
- experiments/ppo_simple.py
  - (Not recommended for using) Naive implementation for discrete action space. I keep it here for educational purposes because I feel this is what most people would implement if they had just read the paper, usually unaware of the amount of implementation details that come with the well-tuned PPO implmentation.
- experiments/ppo_simple_continuous_action.py
  - (Not recommended for using) Naive implementation for continuous action space.
[x] Soft Actor Critic (SAC)
- sac_continuous_action.py
  - For continuous action space.
[x] Deep Deterministic Policy Gradient (DDPG)
- ddpg_continuous_action.py
  - For continuous action space.
[x] Twin Delayed Deep Deterministic Policy Gradient (TD3)
- td3_continuous_action.py
  - For continuous action space.
[x] Apex Deep Q-Learning (Apex-DQN)
- apex_dqn_atari_visual.py
  - For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.

Open RL Benchmark

Open RL Benchmark (https://benchmark.cleanrl.dev) is our project to create a comprehensive benchmark of popular DRL algorithms in a variety of games, where everything about the benchmark is open. That is, you can check the following information for each experiment:

hyper-parameters (check it at the Overview tab of a run)
training metrics (e.g. episode reward, training losses. Check it at the Charts tab of a run)
videos of the agents playing the game (check it at the Charts tab of a run)
system metrics (e.g. CPU utilization, memory utilization. Check it at the Systems tab of a run)
stdout, stderr of the script (check it at the Logs tab of a run)
all dependencies (check requirements.txt at the Files tab of a run))
source code (this is especially helpful since we have single file implementation, so we know exactly all of the code that is responsible for the run. Check it at the Code tab of a run))
(Currently not working. Public access is blocked by https://github.com/wandb/client/issues/1177) the exact commands to reproduce it (check it at the Overview tab of a run.

We hope it could bring a new level of transparency, openness, and reproducibility. Our plan is to benchmark as many algorithms and games as possible. If you are interested, please join us and contribute more algorithms and games. To get started, check out our contribution guide and our roadmap for the Open RL Benchmark

We currently support 34+ games and our implmentation performs competitively against published results. See the table below for selected examples

	c51_atari_visual.py	dqn_atari_visual.py	ppo_atari_visual.py
BeamRiderNoFrameskip-v4	9128.00 ± 0.00	6156.13 ± 461.47	1881.11 ± 166.89
QbertNoFrameskip-v4	13814.24 ± 3357.99	15241.67 ± 0.00	18755.36 ± 205.36
SpaceInvadersNoFrameskip-v4	2140.00 ± 0.00	1616.11 ± 226.67	871.56 ± 133.44
PongNoFrameskip-v4	16.33 ± 0.00	19.33 ± 0.33	20.89 ± 0.00
BreakoutNoFrameskip-v4	404.11 ± 0.00	354.78 ± 9.22	413.73 ± 15.39

	ddpg_continuous_action.py	td3_continuous_action.py	ppo_continuous_action.py
Ant-v2	503.32 ± 18.70	5368.18 ± 771.11	3368.17 ± 759.13
Humanoid-v2	942.16 ± 436.22	6334.40 ± 140.05	918.19 ± 102.71
Walker2DBulletEnv-v0	708.51 ± 240.64	2168.87 ± 65.78	906.10 ± 51.96
HalfCheetahBulletEnv-v0	2821.87 ± 266.03	2542.99 ± 318.23	2189.66 ± 141.61
HopperBulletEnv-v0	1540.77 ± 821.54	2302.09 ± 24.46	2300.96 ± 47.46
BipedalWalker-v3	140.20 ± 52.05	164.06 ± 147.22	219.96 ± 47.49
LunarLanderContinuous-v2	210.01 ± 0.00	290.73 ± 4.44	161.28 ± 37.48
Pendulum-v0	-186.83 ± 12.35	-246.53 ± 6.73	-1280.11 ± 39.22
MountainCarContinuous-v0	-0.98 ± 0.02	-1.11 ± 0.10	93.84 ± 0.00

Get started

To run experiments locally, give the following a try:

$ git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
$ pip install -e .
$ cd cleanrl
$ python ppo.py \
    --seed 1 \
    --gym-id CartPole-v0 \
    --total-timesteps 50000 \
# open another temrminal and enter `cd cleanrl/cleanrl`
$ tensorboard --logdir runs

To use wandb integration, sign up an account at https://wandb.com and copy the API key. Then run

$ cd cleanrl
$ pip install wandb
$ wandb login ${WANBD_API_KEY}
$ python ppo.py \
    --seed 1 \
    --gym-id CartPole-v0 \
    --total-timesteps 50000 \
    --prod-mode \
    --wandb-project-name cleanrltest 
# Then go to https://app.wandb.ai/${WANDB_USERNAME}/cleanrltest/

Checkout the demo sites at https://app.wandb.ai/costa-huang/cleanrltest

Install optional dependencies

The following instructions assume linux environements

# installing starcraft
# enter pass word `iagreetotheeula` when prompted
$ rm ~/StarCraftII -fR \
wget -O ~/StarCraftII.zip http://blzdistsc2-a.akamaihd.net/Linux/SC2.4.10.zip && \
unzip ~/StarCraftII.zip -d ~/ && \
rm ~/StarCraftII.zip
mv ~/StarCraftII/Libs/libstdc++.so.6 ~/StarCraftII/libstdc++.so.6


# install microrts
$ pip install gym-microrts

Support and get involved

We have a Slack Community for support. Feel free to ask questions. Posting in Github Issues and PRs are also welcome.

In addition, we also have a monthly development cycle to implement new RL algorithms. Feel free to participate or ask questions there, too. You can sign up for our mailing list at our Google Groups to receive event RVSP which contains the Hangout video call address every week. Our past video recordings are available at YouTube

Contribution

We have a short contribution guide here https://github.com/vwxyzjn/cleanrl/blob/master/CONTRIBUTING.md. Consider adding new algorithms or test new games on the Open RL Benchmark (https://benchmark.cleanrl.dev)

Big thanks to all the contributors of CleanRL!

Citing our project

Please consider using the following Bibtex entry:

@misc{cleanrl,
  author = {Shengyi Huang, Rousslan Dossa, and Chang Ye},
  title = {CleanRL: High-quality Single-file Implementation of Deep Reinforcement Learning algorithms},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/vwxyzjn/cleanrl/}},
}

References

I have been heavily inspired by the many repos and blog posts. Below contains a incomplete list of them.

The following ones helped me a lot with the continuous action space handling:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 349

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗