All Projects → werner-duvaud → Muzero General

werner-duvaud / Muzero General

Licence: mit
MuZero

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Muzero General

Drq
DrQ: Data regularized Q
Stars: ✭ 268 (-77.42%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, rl
Rlenv.directory
Explore and find reinforcement learning environments in a list of 150+ open source environments.
Stars: ✭ 79 (-93.34%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, rl
Gym Gazebo2
gym-gazebo2 is a toolkit for developing and comparing reinforcement learning algorithms using ROS 2 and Gazebo
Stars: ✭ 257 (-78.35%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, rl
Rad
RAD: Reinforcement Learning with Augmented Data
Stars: ✭ 268 (-77.42%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, rl
Atari
AI research environment for the Atari 2600 games 🤖.
Stars: ✭ 174 (-85.34%)
Mutual labels:  gym, reinforcement-learning, rl
Naf Tensorflow
"Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow
Stars: ✭ 192 (-83.82%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Pytorch Rl
This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch
Stars: ✭ 394 (-66.81%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Drlkit
A High Level Python Deep Reinforcement Learning library. Great for beginners, prototyping and quickly comparing algorithms
Stars: ✭ 29 (-97.56%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Deterministic Gail Pytorch
PyTorch implementation of Deterministic Generative Adversarial Imitation Learning (GAIL) for Off Policy learning
Stars: ✭ 44 (-96.29%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Deepdrive
Deepdrive is a simulator that allows anyone with a PC to push the state-of-the-art in self-driving
Stars: ✭ 628 (-47.09%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Rl Book
Source codes for the book "Reinforcement Learning: Theory and Python Implementation"
Stars: ✭ 464 (-60.91%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Paac.pytorch
Pytorch implementation of the PAAC algorithm presented in Efficient Parallel Methods for Deep Reinforcement Learning https://arxiv.org/abs/1705.04862
Stars: ✭ 22 (-98.15%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (-85.34%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Rl Baselines3 Zoo
A collection of pre-trained RL agents using Stable Baselines3, training and hyperparameter optimization included.
Stars: ✭ 161 (-86.44%)
Mutual labels:  gym, reinforcement-learning, rl
Stable Baselines
Mirror of Stable-Baselines: a fork of OpenAI Baselines, implementations of reinforcement learning algorithms
Stars: ✭ 115 (-90.31%)
Mutual labels:  gym, reinforcement-learning, rl
Pytorch sac ae
PyTorch implementation of Soft Actor-Critic + Autoencoder(SAC+AE)
Stars: ✭ 94 (-92.08%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Learning To Communicate Pytorch
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch
Stars: ✭ 236 (-80.12%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, rl
Rl algos
Reinforcement Learning Algorithms
Stars: ✭ 14 (-98.82%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Mushroom Rl
Python library for Reinforcement Learning.
Stars: ✭ 442 (-62.76%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, rl
Rl Baselines Zoo
A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
Stars: ✭ 839 (-29.32%)
Mutual labels:  gym, reinforcement-learning, rl

supported platforms supported python versions dependencies status style black license MIT discord badge

MuZero General

A commented and documented implementation of MuZero based on the Google DeepMind paper (Nov 2019) and the associated pseudocode. It is designed to be easily adaptable for every games or reinforcement learning environments (like gym). You only need to add a game file with the hyperparameters and the game class. Please refer to the documentation and the example.

MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to Value prediction networks. See How it works.

Features

  • [x] Residual Network and Fully connected network in PyTorch
  • [x] Multi-Threaded/Asynchronous/Cluster with Ray
  • [X] Multi GPU support for the training and the selfplay
  • [x] TensorBoard real-time monitoring
  • [x] Model weights automatically saved at checkpoints
  • [x] Single and two player mode
  • [x] Commented and documented
  • [x] Easily adaptable for new games
  • [x] Examples of board games, Gym and Atari games (See list of implemented games)
  • [x] Pretrained weights available
  • [ ] Windows support (Experimental / Workaround: Use the notebook in Google Colab)

Further improvements

These improvements are active research, they are personal ideas and go beyond MuZero paper. We are open to contributions and other ideas.

Demo

All performances are tracked and displayed in real time in TensorBoard :

cartpole training summary

Testing Lunar Lander :

lunarlander training preview

Games already implemented

  • Cartpole (Tested with the fully connected network)
  • Lunar Lander (Tested in deterministic mode with the fully connected network)
  • Gridworld (Tested with the fully connected network)
  • Tic-tac-toe (Tested with the fully connected network and the residual network)
  • Connect4 (Slightly tested with the residual network)
  • Gomoku
  • Twenty-One / Blackjack (Tested with the residual network)
  • Atari Breakout

Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.

Code structure

code structure

Network summary:

Getting started

Installation

git clone https://github.com/werner-duvaud/muzero-general.git
cd muzero-general

pip install -r requirements.txt

Run

python muzero.py

To visualize the training results, run in a new terminal:

tensorboard --logdir ./results

Config

You can adapt the configurations of each game by editing the MuZeroConfig class of the respective file in the games folder.

Authors

Please use this bibtex if you want to cite this repository (master branch) in your publications:

@misc{muzero-general,
  author       = {Werner Duvaud, Aurèle Hainaut},
  title        = {MuZero General: Open Reimplementation of MuZero},
  year         = {2019},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/werner-duvaud/muzero-general}},
}

Getting involved

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].