kaesve / muzero

Licence: MIT license

A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to muzero

CRNN.tf2

Convolutional Recurrent Neural Network(CRNN) for End-to-End Text Recognition - TensorFlow 2

Stars: ✭ 131 (+3.97%)

Mutual labels: tf2, tensorflow2

pcdarts-tf2

PC-DARTS (PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search, published in ICLR 2020) implemented in Tensorflow 2.0+. This is an unofficial implementation.

Stars: ✭ 25 (-80.16%)

Mutual labels: tf2, tensorflow2

tf-faster-rcnn

Tensorflow 2 Faster-RCNN implementation from scratch supporting to the batch processing with MobileNetV2 and VGG16 backbones

Stars: ✭ 88 (-30.16%)

Mutual labels: tf2, tensorflow2

Alphazero gomoku

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Stars: ✭ 2,570 (+1939.68%)

Mutual labels: mcts, alphazero

Awesome-Tensorflow2

基于Tensorflow2开发的优秀扩展包及项目

Stars: ✭ 45 (-64.29%)

Mutual labels: tf2, tensorflow2

transformer-tensorflow2.0

transformer in tensorflow 2.0

Stars: ✭ 53 (-57.94%)

Mutual labels: tf2, tensorflow2

godpaper

🐵 An AI chess-board-game framework(by many programming languages) implementations.

Stars: ✭ 40 (-68.25%)

Mutual labels: deep-reinforcement-learning, mcts

AlphaZero Gobang

Deep Learning big homework of UCAS

Stars: ✭ 29 (-76.98%)

Mutual labels: mcts, alphazero

Deep RL with pytorch

A pytorch tutorial for DRL(Deep Reinforcement Learning)

Stars: ✭ 160 (+26.98%)

Mutual labels: deep-reinforcement-learning, mcts

FinRL

FinRL: The first open-source project for financial reinforcement learning. Please star. 🔥

Stars: ✭ 3,497 (+2675.4%)

Mutual labels: deep-reinforcement-learning, tensorflow2

alpha sigma

A pytorch based Gomoku game model. Alpha Zero algorithm based reinforcement Learning and Monte Carlo Tree Search model.

Stars: ✭ 134 (+6.35%)

Mutual labels: deep-reinforcement-learning, alphazero

manning tf2 in action

The official code repository for "TensorFlow in Action" by Manning.

Stars: ✭ 61 (-51.59%)

Mutual labels: tf2, tensorflow2

Alpha Zero General

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Stars: ✭ 2,617 (+1976.98%)

Mutual labels: mcts, alphazero

spectral normalization-tf2

🌈 Spectral Normalization implemented as Tensorflow 2

Stars: ✭ 36 (-71.43%)

Mutual labels: tf2, tensorflow2

alpha-zero

AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.

Stars: ✭ 68 (-46.03%)

Mutual labels: mcts, alphazero

keras efficientnet v2

self defined efficientnetV2 according to official version. Including converted ImageNet/21K/21k-ft1k weights.

Stars: ✭ 56 (-55.56%)

Mutual labels: tf2, tensorflow2

computer-go-dataset

datasets for computer go

Stars: ✭ 133 (+5.56%)

Mutual labels: alphazero, muzero

deep reinforcement learning gallery

Deep reinforcement learning with tensorflow2

Stars: ✭ 35 (-72.22%)

Mutual labels: deep-reinforcement-learning, tensorflow2

Finrl Library

FinRL: Financial Reinforcement Learning Framework. Please star. 🔥

Stars: ✭ 3,037 (+2310.32%)

Mutual labels: deep-reinforcement-learning, tensorflow2

TF2-GAN

🐳 GAN implemented as Tensorflow 2.X

Stars: ✭ 61 (-51.59%)

Mutual labels: tf2, tensorflow2

View All Similar Projects ➔

MuZero Vs. AlphaZero in Tensorflow

We provide a readable, commented, well documented, and conceptually easy implementation of the AlphaZero and MuZero algorithms based on the popular AlphaZero-General implementation. Our implementation extends AlphaZero to work with singleplayer domains, like its successor MuZero. The codebase provides a modular framework to design your own AlphaZero and MuZero models and an API to pit the two algorithms against each other. This API also allows MuZero agents to more strongly rely on their learned model during interaction with the environment; the programmer can e.g., specify the sparsity of observations to a learned MuZero agent during a trial. Our interface also provides sufficient abstraction to extend the MuZero or AlphaZero algorithm for research purposes.

Note that we did not perform extensive testing on the boardgames, we experienced that this was very time intensive and difficult to tune. Well tested environments include the Gym environments: CartPole-v1, MountainCar-v0, Pendulum-v0

How to run:

In order to run experiments/ train agents, you first need a .json configuration file (see Configurations/ModelConfigs) for specifying the agent's parameters. Within this .json file you also need to specify a neural network architectures (see Agents/init.py for existing architectures). Then run Main.py with the following flags to train an agent:

python Main.py train -c my/config/file.json --game gym_Cartpole-v1 --gpu [INT]

See the wiki for a more elaborate overview of the hyperparameters and how to create new agents or games.

Minimal requirements

Python 3.7+

tensorflow
keras standalone (until tensorflow 2.3 is available on anaconda windows)
tqdm

Tested Versions (Windows and Linux)

Python 3.7.9

tensorflow 2.1.0
keras 2.3.1

Example Results

This codebase was designed for a Masters Course at Leiden University, we utilized the code to create visualizations of the learned MDP model within MuZero. We did this exclusively for MountainCar, the visualization tool can be viewed here: https://kaesve.nl/projects/muzero-model-inspector/#/; an example illustration of this is shown below. This figure illustrates the entire state-space from the MountainCar being embedded by MuZero's encoding network projected to the 3-PC space of the embedding's neural activation values.

We quantified the efficacy of our MuZero and AlphaZero implementations also on the CartPole environment over numerous hyperparameters. The canonical MuZero can be quite unstable depending on the hyperparameters, the figure shows this through median and mean training rewards over 8 training runs.

The figure below illustrates the efficacy of learned models on MountainCar, when we only provide the MuZero agent observations every n'th environment step along with the agent's learning progress with dense observations.

No boardgames were tested for MuZero as computation time quickly became an issue for us, even on smaller boardsizes. We did find that AlphaZero could learn good policies on boardgames, we found that it depends on the observation encoding. Heuristic encoding as used in AlphaZero seemed less effective to the canonicalBoard representation used in AlphaZero-General.

Our paper can be read for more details here: arxiv:2102.12924.

Our Contributions

There are already a variety of MuZero and AlphaZero implementations available:

AlphaZero-General (any framework; sequential): https://github.com/suragnair/alpha-zero-general
MuZero-General (Pytorch; parallelized): https://github.com/werner-duvaud/muzero-general
MuZero in Tensorflow (Tensorflow; sequential): https://github.com/johan-gras/MuZero

Our implementation is intended to be both pedagogical and functional. So, we focussed on documentation, elegance, and clarity of the code. Ours also provides functionality for masking observations during trials and regularizing transition dynamics for fitting the MDP model. We omitted parallelization as used in the original MuZero paper due to this reason; but it can be implemented in the future.

References

Schrittwieser, Julian et al. (Feb. 21, 2020). “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model”. [cs, stat]. arXiv:1911.08265
Silver, David et al. (Dec. 2018). “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play”. In:Science 362.6419, pp. 1140–1144.DOI:10.1126/science.aar6404

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

kaesve / muzero

Programming Languages

Labels

Projects that are alternatives of or similar to muzero

MuZero Vs. AlphaZero in Tensorflow

How to run:

Minimal requirements

Tested Versions (Windows and Linux)

Example Results

Our Contributions

References