All Projects → kaesve → muzero

kaesve / muzero

Licence: MIT license
A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to muzero

CRNN.tf2
Convolutional Recurrent Neural Network(CRNN) for End-to-End Text Recognition - TensorFlow 2
Stars: ✭ 131 (+3.97%)
Mutual labels:  tf2, tensorflow2
pcdarts-tf2
PC-DARTS (PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search, published in ICLR 2020) implemented in Tensorflow 2.0+. This is an unofficial implementation.
Stars: ✭ 25 (-80.16%)
Mutual labels:  tf2, tensorflow2
tf-faster-rcnn
Tensorflow 2 Faster-RCNN implementation from scratch supporting to the batch processing with MobileNetV2 and VGG16 backbones
Stars: ✭ 88 (-30.16%)
Mutual labels:  tf2, tensorflow2
Alphazero gomoku
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
Stars: ✭ 2,570 (+1939.68%)
Mutual labels:  mcts, alphazero
Awesome-Tensorflow2
基于Tensorflow2开发的优秀扩展包及项目
Stars: ✭ 45 (-64.29%)
Mutual labels:  tf2, tensorflow2
transformer-tensorflow2.0
transformer in tensorflow 2.0
Stars: ✭ 53 (-57.94%)
Mutual labels:  tf2, tensorflow2
godpaper
🐵 An AI chess-board-game framework(by many programming languages) implementations.
Stars: ✭ 40 (-68.25%)
Mutual labels:  deep-reinforcement-learning, mcts
AlphaZero Gobang
Deep Learning big homework of UCAS
Stars: ✭ 29 (-76.98%)
Mutual labels:  mcts, alphazero
Deep RL with pytorch
A pytorch tutorial for DRL(Deep Reinforcement Learning)
Stars: ✭ 160 (+26.98%)
Mutual labels:  deep-reinforcement-learning, mcts
FinRL
FinRL: The first open-source project for financial reinforcement learning. Please star. 🔥
Stars: ✭ 3,497 (+2675.4%)
Mutual labels:  deep-reinforcement-learning, tensorflow2
alpha sigma
A pytorch based Gomoku game model. Alpha Zero algorithm based reinforcement Learning and Monte Carlo Tree Search model.
Stars: ✭ 134 (+6.35%)
Mutual labels:  deep-reinforcement-learning, alphazero
manning tf2 in action
The official code repository for "TensorFlow in Action" by Manning.
Stars: ✭ 61 (-51.59%)
Mutual labels:  tf2, tensorflow2
Alpha Zero General
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Stars: ✭ 2,617 (+1976.98%)
Mutual labels:  mcts, alphazero
spectral normalization-tf2
🌈 Spectral Normalization implemented as Tensorflow 2
Stars: ✭ 36 (-71.43%)
Mutual labels:  tf2, tensorflow2
alpha-zero
AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.
Stars: ✭ 68 (-46.03%)
Mutual labels:  mcts, alphazero
keras efficientnet v2
self defined efficientnetV2 according to official version. Including converted ImageNet/21K/21k-ft1k weights.
Stars: ✭ 56 (-55.56%)
Mutual labels:  tf2, tensorflow2
computer-go-dataset
datasets for computer go
Stars: ✭ 133 (+5.56%)
Mutual labels:  alphazero, muzero
deep reinforcement learning gallery
Deep reinforcement learning with tensorflow2
Stars: ✭ 35 (-72.22%)
Mutual labels:  deep-reinforcement-learning, tensorflow2
Finrl Library
FinRL: Financial Reinforcement Learning Framework. Please star. 🔥
Stars: ✭ 3,037 (+2310.32%)
Mutual labels:  deep-reinforcement-learning, tensorflow2
TF2-GAN
🐳 GAN implemented as Tensorflow 2.X
Stars: ✭ 61 (-51.59%)
Mutual labels:  tf2, tensorflow2

MuZero Vs. AlphaZero in Tensorflow

We provide a readable, commented, well documented, and conceptually easy implementation of the AlphaZero and MuZero algorithms based on the popular AlphaZero-General implementation. Our implementation extends AlphaZero to work with singleplayer domains, like its successor MuZero. The codebase provides a modular framework to design your own AlphaZero and MuZero models and an API to pit the two algorithms against each other. This API also allows MuZero agents to more strongly rely on their learned model during interaction with the environment; the programmer can e.g., specify the sparsity of observations to a learned MuZero agent during a trial. Our interface also provides sufficient abstraction to extend the MuZero or AlphaZero algorithm for research purposes.

Note that we did not perform extensive testing on the boardgames, we experienced that this was very time intensive and difficult to tune. Well tested environments include the Gym environments: CartPole-v1, MountainCar-v0, Pendulum-v0

How to run:

In order to run experiments/ train agents, you first need a .json configuration file (see Configurations/ModelConfigs) for specifying the agent's parameters. Within this .json file you also need to specify a neural network architectures (see Agents/init.py for existing architectures). Then run Main.py with the following flags to train an agent:

python Main.py train -c my/config/file.json --game gym_Cartpole-v1 --gpu [INT]

See the wiki for a more elaborate overview of the hyperparameters and how to create new agents or games.

Minimal requirements

  • Python 3.7+
  • tensorflow
  • keras standalone (until tensorflow 2.3 is available on anaconda windows)
  • tqdm

Tested Versions (Windows and Linux)

  • Python 3.7.9
  • tensorflow 2.1.0
  • keras 2.3.1

Example Results

This codebase was designed for a Masters Course at Leiden University, we utilized the code to create visualizations of the learned MDP model within MuZero. We did this exclusively for MountainCar, the visualization tool can be viewed here: https://kaesve.nl/projects/muzero-model-inspector/#/; an example illustration of this is shown below. This figure illustrates the entire state-space from the MountainCar being embedded by MuZero's encoding network projected to the 3-PC space of the embedding's neural activation values.

example

We quantified the efficacy of our MuZero and AlphaZero implementations also on the CartPole environment over numerous hyperparameters. The canonical MuZero can be quite unstable depending on the hyperparameters, the figure shows this through median and mean training rewards over 8 training runs.

example2

The figure below illustrates the efficacy of learned models on MountainCar, when we only provide the MuZero agent observations every n'th environment step along with the agent's learning progress with dense observations.

example3

No boardgames were tested for MuZero as computation time quickly became an issue for us, even on smaller boardsizes. We did find that AlphaZero could learn good policies on boardgames, we found that it depends on the observation encoding. Heuristic encoding as used in AlphaZero seemed less effective to the canonicalBoard representation used in AlphaZero-General.

Our paper can be read for more details here: arxiv:2102.12924.

Our Contributions

There are already a variety of MuZero and AlphaZero implementations available:

Our implementation is intended to be both pedagogical and functional. So, we focussed on documentation, elegance, and clarity of the code. Ours also provides functionality for masking observations during trials and regularizing transition dynamics for fitting the MDP model. We omitted parallelization as used in the original MuZero paper due to this reason; but it can be implemented in the future.

References

  • Schrittwieser, Julian et al. (Feb. 21, 2020). “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model”. [cs, stat]. arXiv:1911.08265
  • Silver, David et al. (Dec. 2018). “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play”. In:Science 362.6419, pp. 1140–1144.DOI:10.1126/science.aar6404
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].