All Projects → junxiaosong → Alphazero_gomoku

junxiaosong / Alphazero_gomoku

Licence: mit
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Alphazero gomoku

Alpha Zero General
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Stars: ✭ 2,617 (+1.83%)
Mutual labels:  reinforcement-learning, mcts, gomoku, monte-carlo-tree-search, gobang, alphago, alphago-zero, alphazero
alphaFive
alphaGo版本的五子棋(gobang, gomoku)
Stars: ✭ 51 (-98.02%)
Mutual labels:  gomoku, gobang, alphago, alphago-zero, alphazero
AlphaZero Gobang
Deep Learning big homework of UCAS
Stars: ✭ 29 (-98.87%)
Mutual labels:  mcts, gomoku, gobang, alphazero
AnimalChess
Animal Fight Chess Game(斗兽棋) written in rust.
Stars: ✭ 76 (-97.04%)
Mutual labels:  board-game, monte-carlo-tree-search, alphazero
alpha sigma
A pytorch based Gomoku game model. Alpha Zero algorithm based reinforcement Learning and Monte Carlo Tree Search model.
Stars: ✭ 134 (-94.79%)
Mutual labels:  gomoku, monte-carlo-tree-search, alphazero
UCThello
UCThello - a board game demonstrator (Othello variant) with computer AI using Monte Carlo Tree Search (MCTS) with UCB (Upper Confidence Bounds) applied to trees (UCT in short)
Stars: ✭ 26 (-98.99%)
Mutual labels:  board-game, mcts, monte-carlo-tree-search
godpaper
🐵 An AI chess-board-game framework(by many programming languages) implementations.
Stars: ✭ 40 (-98.44%)
Mutual labels:  board-game, mcts, alphago
Agentnet
Deep Reinforcement Learning library for humans
Stars: ✭ 298 (-88.4%)
Mutual labels:  reinforcement-learning, lasagne, theano
alpha-zero
AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.
Stars: ✭ 68 (-97.35%)
Mutual labels:  mcts, alphago-zero, alphazero
Elf
ELF: a platform for game research with AlphaGoZero/AlphaZero reimplementation
Stars: ✭ 3,240 (+26.07%)
Mutual labels:  reinforcement-learning, rl, alphago-zero
Practical rl
A course in reinforcement learning in the wild
Stars: ✭ 4,741 (+84.47%)
Mutual labels:  reinforcement-learning, lasagne, theano
Deep Learning Python
Intro to Deep Learning, including recurrent, convolution, and feed forward neural networks.
Stars: ✭ 94 (-96.34%)
Mutual labels:  lasagne, theano
Repo 2016
R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation
Stars: ✭ 103 (-95.99%)
Mutual labels:  lasagne, theano
Aws Robomaker Sample Application Deepracer
Use AWS RoboMaker and demonstrate running a simulation which trains a reinforcement learning (RL) model to drive a car around a track
Stars: ✭ 105 (-95.91%)
Mutual labels:  reinforcement-learning, rl
Stable Baselines
Mirror of Stable-Baselines: a fork of OpenAI Baselines, implementations of reinforcement learning algorithms
Stars: ✭ 115 (-95.53%)
Mutual labels:  reinforcement-learning, rl
Rlenv.directory
Explore and find reinforcement learning environments in a list of 150+ open source environments.
Stars: ✭ 79 (-96.93%)
Mutual labels:  reinforcement-learning, rl
Psgan
Periodic Spatial Generative Adversarial Networks
Stars: ✭ 108 (-95.8%)
Mutual labels:  lasagne, theano
Rl trading
An environment to high-frequency trading agents under reinforcement learning
Stars: ✭ 205 (-92.02%)
Mutual labels:  reinforcement-learning, rl
Pytorch Rl
Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]
Stars: ✭ 121 (-95.29%)
Mutual labels:  reinforcement-learning, rl
Reinforcement learning
Implementation of selected reinforcement learning algorithms in Tensorflow. A3C, DDPG, REINFORCE, DQN, etc.
Stars: ✭ 132 (-94.86%)
Mutual labels:  reinforcement-learning, rl

AlphaZero-Gomoku

This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) from pure self-play training. The game Gomoku is much simpler than Go or chess, so that we can focus on the training scheme of AlphaZero and obtain a pretty good AI model on a single PC in a few hours.

References:

  1. AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
  2. AlphaGo Zero: Mastering the game of Go without human knowledge

Update 2018.2.24: supports training with TensorFlow!

Update 2018.1.17: supports training with PyTorch!

Example Games Between Trained Models

  • Each move with 400 MCTS playouts:
    playout400

Requirements

To play with the trained AI models, only need:

  • Python >= 2.7
  • Numpy >= 1.11

To train the AI model from scratch, further need, either:

  • Theano >= 0.7 and Lasagne >= 0.1
    or
  • PyTorch >= 0.2.0
    or
  • TensorFlow

PS: if your Theano's version > 0.7, please follow this issue to install Lasagne,
otherwise, force pip to downgrade Theano to 0.7 pip install --upgrade theano==0.7.0

If you would like to train the model using other DL frameworks, you only need to rewrite policy_value_net.py.

Getting Started

To play with provided models, run the following script from the directory:

python human_play.py  

You may modify human_play.py to try different provided models or the pure MCTS.

To train the AI model from scratch, with Theano and Lasagne, directly run:

python train.py

With PyTorch or TensorFlow, first modify the file train.py, i.e., comment the line

from policy_value_net import PolicyValueNet  # Theano and Lasagne

and uncomment the line

# from policy_value_net_pytorch import PolicyValueNet  # Pytorch
or
# from policy_value_net_tensorflow import PolicyValueNet # Tensorflow

and then execute: python train.py (To use GPU in PyTorch, set use_gpu=True and use return loss.item(), entropy.item() in function train_step in policy_value_net_pytorch.py if your pytorch version is greater than 0.5)

The models (best_policy.model and current_policy.model) will be saved every a few updates (default 50).

Note: the 4 provided models were trained using Theano/Lasagne, to use them with PyTorch, please refer to issue 5.

Tips for training:

  1. It is good to start with a 6 * 6 board and 4 in a row. For this case, we may obtain a reasonably good model within 500~1000 self-play games in about 2 hours.
  2. For the case of 8 * 8 board and 5 in a row, it may need 2000~3000 self-play games to get a good model, and it may take about 2 days on a single PC.

Further reading

My article describing some details about the implementation in Chinese: https://zhuanlan.zhihu.com/p/32089487

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].