All Projects → farmersrice → saltzero

farmersrice / saltzero

Licence: GPL-3.0 license
Machine learning bot for ultimate tic-tac-toe based on DeepMind's AlphaGo Zero paper. C++ and Python.

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to saltzero

alphaFive
alphaGo版本的五子棋(gobang, gomoku)
Stars: ✭ 51 (+88.89%)
Mutual labels:  alphago-zero, alphazero
alphazero
Board Game Reinforcement Learning using AlphaZero method. including Makhos (Thai Checkers), Reversi, Connect Four, Tic-tac-toe game rules
Stars: ✭ 24 (-11.11%)
Mutual labels:  alphago-zero, alphazero
Alphazero gomoku
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
Stars: ✭ 2,570 (+9418.52%)
Mutual labels:  alphago-zero, alphazero
Alpha Zero General
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Stars: ✭ 2,617 (+9592.59%)
Mutual labels:  alphago-zero, alphazero
alpha-zero
AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.
Stars: ✭ 68 (+151.85%)
Mutual labels:  alphago-zero, alphazero
connect4
Solving board games like Connect4 using Deep Reinforcement Learning
Stars: ✭ 33 (+22.22%)
Mutual labels:  alphago-zero
muzero
A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.
Stars: ✭ 126 (+366.67%)
Mutual labels:  alphazero
AlphaZero-Renju
No description or website provided.
Stars: ✭ 17 (-37.04%)
Mutual labels:  alphazero
alpha sigma
A pytorch based Gomoku game model. Alpha Zero algorithm based reinforcement Learning and Monte Carlo Tree Search model.
Stars: ✭ 134 (+396.3%)
Mutual labels:  alphazero
MyAlphaGoZeroOnConnect4
My Simple Implementation of AlphaGo Zero on Connect4
Stars: ✭ 16 (-40.74%)
Mutual labels:  alphago-zero
Zerofish
An implementation of the AlphaZero algorithm for chess
Stars: ✭ 34 (+25.93%)
Mutual labels:  alphazero
KKAlphaGoZero
alphaGoZero论文的实现
Stars: ✭ 35 (+29.63%)
Mutual labels:  alphago-zero
Chess Alpha Zero
Chess reinforcement learning by AlphaGo Zero methods.
Stars: ✭ 1,868 (+6818.52%)
Mutual labels:  alphago-zero
Elf
ELF: a platform for game research with AlphaGoZero/AlphaZero reimplementation
Stars: ✭ 3,240 (+11900%)
Mutual labels:  alphago-zero
connect4-alpha-zero
Connect4 reinforcement learning by AlphaGo Zero methods.
Stars: ✭ 102 (+277.78%)
Mutual labels:  alphago-zero
sai
SAI: a fork of Leela Zero with variable komi.
Stars: ✭ 92 (+240.74%)
Mutual labels:  alphago-zero
AlphaZero Gobang
Deep Learning big homework of UCAS
Stars: ✭ 29 (+7.41%)
Mutual labels:  alphazero
Chess-Zero
Chess reinforcement learning by AlphaZero methods.
Stars: ✭ 36 (+33.33%)
Mutual labels:  alphazero
AnimalChess
Animal Fight Chess Game(斗兽棋) written in rust.
Stars: ✭ 76 (+181.48%)
Mutual labels:  alphazero
computer-go-dataset
datasets for computer go
Stars: ✭ 133 (+392.59%)
Mutual labels:  alphazero

SaltZero

Machine learning bot for ultimate tic-tac-toe based on DeepMind's AlphaGo Zero / AlphaZero papers. No human knowledge provided.

Ultimate tic-tac-toe is a game involving a 3 by 3 grid, each cell of which is itself a regular tic-tac-toe game, for a total board size of 9 by 9. Read the Wikipedia page for a concise summary of the rules of the game. Please note that some posts on the internet claim that ultimate tic-tac-toe is solved; however, this only applies to a weaker variant which does not forbid playing in won local boards. Real ultimate tic-tac-toe is unsolved.

I want to play right now

Download from the releases section in order to get the weight file.

Run HumanVsRobotTest.py in order to play against the neural network. In this setting the bot always plays the best move. Your move must be an integer from 0 to 80, inclusive. Integers corresponding to board positions can be seen in the diagram below:

	[ 0.,  1.,  2.,  9., 10., 11., 18., 19., 20.],
	[ 3.,  4.,  5., 12., 13., 14., 21., 22., 23.],
	[ 6.,  7.,  8., 15., 16., 17., 24., 25., 26.],
	[27., 28., 29., 36., 37., 38., 45., 46., 47.],
	[30., 31., 32., 39., 40., 41., 48., 49., 50.],
	[33., 34., 35., 42., 43., 44., 51., 52., 53.],
	[54., 55., 56., 63., 64., 65., 72., 73., 74.],
	[57., 58., 59., 66., 67., 68., 75., 76., 77.],
	[60., 61., 62., 69., 70., 71., 78., 79., 80.]

Description

For the general idea of the mechanism behind the bot, read the original paper, Mastering the Game of Go without Human Knowledge. Please note that some additional ideas are also taken from the AlphaZero paper, Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.

The goal is to produce a strong bot with easily understandable code – balancing speed with readability. There are a variety of other implementations of the AlphaGo Zero / AlphaZero idea online for a wide range of games. Other implementations usually fall into two categories: targeting either education and readability or pure speed. In the first case, implementations are often written in pure Python, and as a result are incredibly slow and unusable. In the second case, implementations are extremely large and convoluted and are difficult to understand. This implementation aims to strike a balance, although it focuses more on the readability side than the optimization side.

For a speed reference: more than 2500 games an hour can be played on an i7-8550U at 200 visits per move, using a network with 6 shared hidden dense layers of 512 neurons each. Around 10000 games an hour can be generated on a Ryzen 7 3700X / RTX 2070 Super combination using a network with 7 shared hidden residual blocks (each block contains two dense layers of 1024 neurons each). Currently, the network architecture prevents full GPU utilization, but this is expected to be remedied when bigger networks are used.

Here's a strength graph (note that network number 0 is our reference network, which was trained before switching to resnet, so there is a sudden drop and ascension; network 1 is the first residual network).

Strength graph

Strength

Stronger than the second highest ranked bot on codingame (see the leaderboard here).

Unfortunately, codingame does not allow the use of neural networks, much less the interfacing between C++ and Python used in SaltZero, so SaltZero cannot compete on their site. The author of the second best bot, Yibo Huang (NinjaDoggy), generously provided his code for comparison purposes.

After 200 games played with a fixed 400ms per move, SaltZero v0.0.3 beat NinjaDoggy's bot with a score of 113 to 87; detailed results: + 65 = 96 - 39.

If you would like to compare the strength of your bot against SaltZero, please see the readme in cpp_impl for details on how to do so. If you have done any testing on your own, it would be greatly appreciated if you would send the results to me (ansonhu at mit.edu).

Features

  • Concise C++ implementation for a speed advantage over Python

  • Additional Python implementation for readability

    • C++ implementation is to be considered the most up-to-date; Python implementation may be a few updates behind.
  • Machine learning is all conducted in Python and interfaces to C++ through the Python C API

    • Allows for neural networks and other machine learning models to be coded in Python while still putting the grunt work of game processing on C++. In the Python implementation, Python's slowness is the main bottleneck (ML takes < 1% of time spent), while in the C++ implementation, the neural network and Python/C++ communication are the main bottlenecks.

    • This C++/Python hybrid system is > 15 times faster than the pure Python implementation.

  • Incredibly strong pre-trained network (potentially superhuman) available

  • Automatic deletion of games to save space (every 1000 games takes 200 mb)

  • Batched gameplay for speed

    • Since batched queries to the neural network are often much faster than individual queries, both on CPU and GPU, multiple games are run in a pseudo-parallel fashion. In short, N games (in the code, N = 1000) are simultaneously played. Whenever a new visit in the game tree needs to occur, all games are queried for the positions to be evaluated and these are sent to the neural network in a batch. This provides a huge speed advantage.
  • Easy to adapt to new games

Other usage notes

Compile compare_networks.cpp, generate_games.cpp, and whole_pipeline.cpp and run whole_pipeline.cpp to perform selfplay/training/evaluation.

Note that you will need to produce executables with the proper names (same as the .cpp file names) for the program to work properly. This is because we use system calls in C++ to executables in order to dispose of GPU memory properly (Keras/TensorFlow don't allow us to do this in a nice manner, see comments in whole_pipeline.cpp for more details).

arbiter.cpp and bot.cpp help compare SaltZero to other ultimate tic-tac-toe bots. Please see the readme in cpp_impl for more details.

Compatibility notes

If you use an operating system other than Windows, you will have to change a few strings in whole_pipeline.cpp to make the appropriate system calls. If you use a system that does not use little-endian or is not 64 bit, you may have to play around with NetworkWrapper.cpp and network_wrapper.py (but it also might work out of the box; this is untested).

You will need to use GCC in order to compile, since __gnu_pbds is used for its faster hash table. You can drop in unordered_map or map for compatibility with other compilers if you like.

Differences

This section lists some differences between this implementation and the original paper, as well as other implementations online. For anything not mentioned here, you can generally assume that the implementation follows the original paper.

  • Evaluator uses >= 52.5% gating at 400 games; in other words, a new net must win at least 210 out of 400 games against an old net to pass

  • Uses Dirichlet noise as in the original paper, but this time with alpha = 0.3 to compensate for fewer possible moves (this is seen to be optimal based on the AlphaZero paper)

  • Current neural network does not use convolutions

    • This is because the game rules clearly delineate the differences between small games and the big game. If a convolution overlaps multiple small games, its data will not be very useful, because cells next to each other are not inherently linked unless they are in the same small board (unlike in go where stones in adjacent cells become unified groups or snuff out liberties). Therefore only densely connected layers are used.

Other notes

The latest network is superhuman strength. If you can beat it without computer assistance, please let me know.

The network understands high-level concepts, such as the idea that the optimal move might not be the one that wins the local board. From my handful of games against it, I can say that the network is extremely effective at trapping its opponent in positions where it seems that almost any move will send the bot to a won square and thus allow it to play anywhere.

There are a variety of ad-hoc changes I've made to the learning rate and the networks; there's a small text-only readme in the models folder detailing these changes.

Thanks

Huge thanks to Yibo Huang (NinjaDoggy) for providing the source code of his bot for strength comparisons.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].