All Projects → ar-nowaczynski → uttt

ar-nowaczynski / uttt

Licence: Apache-2.0 license
AlphaZero-like AI solution for playing Ultimate Tic-Tac-Toe in the browser

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Less
1899 projects
HTML
75241 projects
shell
77523 projects

Projects that are alternatives of or similar to uttt

vs-mlrt
Efficient ML Filter Runtimes for VapourSynth (with built-in support for waifu2x, DPIR, RealESRGANv2, and Real-CUGAN)
Stars: ✭ 34 (+21.43%)
Mutual labels:  onnxruntime
alpha-zero
AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.
Stars: ✭ 68 (+142.86%)
Mutual labels:  alpha-zero
graphsignal
Graphsignal Python agent
Stars: ✭ 158 (+464.29%)
Mutual labels:  onnxruntime
lite.ai.toolkit
🛠 A lite C++ toolkit of awesome AI models with ONNXRuntime, NCNN, MNN and TNN. YOLOX, YOLOP, MODNet, YOLOR, NanoDet, YOLOX, SCRFD, YOLOX . MNN, NCNN, TNN, ONNXRuntime, CPU/GPU.
Stars: ✭ 1,354 (+4735.71%)
Mutual labels:  onnxruntime
alphastone
Using self-play, MCTS, and a deep neural network to create a hearthstone ai player
Stars: ✭ 24 (-14.29%)
Mutual labels:  alpha-zero
Alpha Zero General
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Stars: ✭ 2,617 (+9246.43%)
Mutual labels:  alpha-zero
vs-realesrgan
Real-ESRGAN function for VapourSynth
Stars: ✭ 27 (-3.57%)
Mutual labels:  onnxruntime
AlphaGo.jl
AlphaGo Zero implementation using Flux.jl
Stars: ✭ 73 (+160.71%)
Mutual labels:  alpha-zero
AlphaZero-Renju
No description or website provided.
Stars: ✭ 17 (-39.29%)
Mutual labels:  alpha-zero
ultimate-ttt
Play "Ultimate Tic-Tac-Toe" in the browser 🚀
Stars: ✭ 20 (-28.57%)
Mutual labels:  ultimate-tic-tac-toe
ONNX-HITNET-Stereo-Depth-estimation
Python scripts form performing stereo depth estimation using the HITNET model in ONNX.
Stars: ✭ 21 (-25%)
Mutual labels:  onnxruntime
Djl
An Engine-Agnostic Deep Learning Framework in Java
Stars: ✭ 2,262 (+7978.57%)
Mutual labels:  onnxruntime
ultimate-tictactoe
An implementation of ultimate tictactoe in Elm
Stars: ✭ 15 (-46.43%)
Mutual labels:  ultimate-tic-tac-toe
ONNX-Runtime-with-TensorRT-and-OpenVINO
Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment
Stars: ✭ 15 (-46.43%)
Mutual labels:  onnxruntime
djl
An Engine-Agnostic Deep Learning Framework in Java
Stars: ✭ 3,080 (+10900%)
Mutual labels:  onnxruntime
optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
Stars: ✭ 567 (+1925%)
Mutual labels:  onnxruntime
Elf
ELF: a platform for game research with AlphaGoZero/AlphaZero reimplementation
Stars: ✭ 3,240 (+11471.43%)
Mutual labels:  alpha-zero
YOLO-Streaming
Push-pull streaming and Web display of YOLO series
Stars: ✭ 56 (+100%)
Mutual labels:  onnxruntime
fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: ✭ 421 (+1403.57%)
Mutual labels:  onnxruntime
ultimate-ttt-rl
Reinforcement Learning based Ultimate Tic Tac Toe player
Stars: ✭ 20 (-28.57%)
Mutual labels:  ultimate-tic-tac-toe

uttt.ai

AlphaZero-like AI solution for playing Ultimate Tic-Tac-Toe in the browser.

uttt.ai preview

Introduction

This project is a loose adaptation of the original AlphaZero published by Deepmind. It follows key ideas behind the AlphaZero, such as generating training data from self-play or using single neural network to guide the Monte-Carlo Tree Search (MCTS) algorithm. However, the actual implementation of these ideas is different due to limitations on the available computing power, specifically:

  • AI self-play training must fit on a personal computer within a reasonable time (several weeks)
  • AI inference must run in the browser on the client-side hardware within a reasonable time (a few seconds)

You can play Ultimate Tic-Tac-Toe with the AI on the official project website: https://uttt.ai.

Overview

The project overview in chronological order:

overview

Differences from the original AlphaZero

  • Much smaller Policy-Value Network (PVN) architecture designed specifically for playing Ultimate Tic-Tac-Toe in the browser with only 5 million parameters (20 MB): utttpy/selfplay/policy_value_network.py.
  • Total separation of self-play data generation process from the Policy-Value Network training (offline RL).
  • More MCTS simulations per position for training (self-play data quality over quantity).
  • The initial self-play dataset was generated from pure MCTS simulations (random playouts are faster and better than random Policy-Value Network predictions).
  • Search simulations are synchronous, single-threaded and sequential.
  • Enabled data augmentation by flipping the board during the Policy-Value Network training.
  • Value target for MSE loss function is defined as the root's mean state value rather than the game outcome.
  • Masked KL divergence loss for policy head instead of Cross Entropy loss.
  • Auxiliary policy head loss for predicting action values next to action logits.

Evaluation

The following evaluation procedure has been designed and then computed to assess the overall AI performance in Ultimate Tic-Tac-Toe.

Evaluation setup

Assessing the overall AI performance is done by playing tournaments between AIs. There are 3 AIs:

  1. MCTS - Monte-Carlo Tree Search with random playouts
  2. NMCTS1 - (Neural) Monte-Carlo Tree Search with Policy-Value Network guidance after training on stage1-mcts dataset
  3. NMCTS2 - (Neural) Monte-Carlo Tree Search with Policy-Value Network guidance after retraning on stage2-nmcts dataset

And 3 tournaments:

  1. NMCTS1 vs MCTS
  2. NMCTS2 vs MCTS
  3. NMCTS2 vs NMCTS1

Each AI is represented in two versions: quick and strong (to get more nuanced comparison).

AI version num simulations inference time
MCTS 1M (quick) 1,000,000 5.0s
MCTS 10M (strong) 10,000,000 51.4s
NMCTS 1k (quick) 1,000 4.4s
NMCTS 10k (strong) 10,000 17.8s

Inference times were measured on Intel i7-10700K and NVIDIA GeForce RTX 2080 Ti for single-threaded C++ implementations of MCTS and NMCTS (GPU is used by NMCTS to run Policy-Value Network inference).

Each tournament consist of 4 matches (all combinations of quick/strong vs quick/strong):
evaluation tournament template
Each match consist of 100 games initialized from 50 unique positions and each position is played twice (AIs swap sides for the second playthrough).

Initial evaluation positions are defined here: utttpy/selfplay/evaluation_uttt_states.py.

Evaluation results

evaluation results

Evaluation results aggregated

evaluation results aggregated

Takeaways:

  • Training on MCTS evaluations (stage1-mcts dataset) was successfull: NMCTS1 defeats MCTS.
  • Retraining on self-play data (stage2-nmcts dataset) was successfull: NMCTS2 improves upon NMCTS1 consistently.
  • NMCTS2 beats MCTS baseline even with 4 orders of magnitude difference in the number of simulations (1k vs 10M).
  • Next stage (NMCTS3) is needed to see if and how much improvement is possible over NMCTS2.

NMCTS2 is deployed on the https://uttt.ai.

Datasets

There are 2 datasets:

  • stage1-mcts: 8 mln evaluated positions generated by the Monte-Carlo Tree Search self-play and used to train the Policy-Value Network from scratch.
  • stage2-nmcts: 8 mln evaluated positions generated by the Neural Monte-Carlo Tree Search self-play and used to retrain the Policy-Value Network.

Both datasets are available for download here.

Read more about datasets here: datasets/README.md.

Training

Training artifacts are available for download here.

Requirements

This project was developed using:

  • Ubuntu 20.04
  • Python 3.8
  • 2x NVIDIA GeForce RTX 2080 Ti (CUDA 11)
  • Intel i7-10700K (8 cores x 3.80GHz)
  • PyTorch >= 1.8
  • Node.js 14.18.2
  • npm 8.3.0
  • React 17.0.2
  • g++ 9.3.0

License

This project is licensed under the Apache License 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].