All Projects → shayakbanerjee → ultimate-ttt-rl

shayakbanerjee / ultimate-ttt-rl

Licence: MIT License
Reinforcement Learning based Ultimate Tic Tac Toe player

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ultimate-ttt-rl

ultimate-tictactoe
An implementation of ultimate tictactoe in Elm
Stars: ✭ 15 (-25%)
Mutual labels:  tic-tac-toe, ultimate-tic-tac-toe
ultimate-ttt
Play "Ultimate Tic-Tac-Toe" in the browser 🚀
Stars: ✭ 20 (+0%)
Mutual labels:  tic-tac-toe, ultimate-tic-tac-toe
AndTTT
🎲 Simple tic tac toe game for Android
Stars: ✭ 15 (-25%)
Mutual labels:  tic-tac-toe
TicTacToeUI-Android
Check out the new style for App Design aims for Tic Tac Toe Game...😉😀😁😎
Stars: ✭ 40 (+100%)
Mutual labels:  tic-tac-toe
tictactoe-reinforcement-learning
Train a tic-tac-toe agent using reinforcement learning.
Stars: ✭ 36 (+80%)
Mutual labels:  tic-tac-toe
tic-tac-toe
Play tic-tac-toe in your Terminal
Stars: ✭ 42 (+110%)
Mutual labels:  tic-tac-toe
HexBot
A Relatively Simply Awesome Discord bot with Music, Games, Comics, Memes and other cool features. This bot is made in Python 3.8 using discord.py
Stars: ✭ 109 (+445%)
Mutual labels:  tic-tac-toe
tictac
Un-winnable game of Tic Tac Toe using MinMax
Stars: ✭ 15 (-25%)
Mutual labels:  tic-tac-toe
boardgame
An online board game playground. Including lobby, chat
Stars: ✭ 11 (-45%)
Mutual labels:  tic-tac-toe
ttt-tdd
Book about test-driven development with an example of making “Tic-Tac-Toe” by TDD · ❌ ⭕️ 🧪
Stars: ✭ 29 (+45%)
Mutual labels:  tic-tac-toe
uttt
AlphaZero-like AI solution for playing Ultimate Tic-Tac-Toe in the browser
Stars: ✭ 28 (+40%)
Mutual labels:  ultimate-tic-tac-toe
discord-tictactoe
Highly customizable innovative Discord Bot for playing Tic-Tac-Toe 🎮🏅
Stars: ✭ 84 (+320%)
Mutual labels:  tic-tac-toe
FITTER
Fortran tIc Toc Timer
Stars: ✭ 14 (-30%)
Mutual labels:  tic-tac-toe
TicTacToe-SwiftUI
Unidirectional data flow tic-tac-toe sample with SwiftUI.
Stars: ✭ 22 (+10%)
Mutual labels:  tic-tac-toe
alphazero
Board Game Reinforcement Learning using AlphaZero method. including Makhos (Thai Checkers), Reversi, Connect Four, Tic-tac-toe game rules
Stars: ✭ 24 (+20%)
Mutual labels:  tic-tac-toe
alpha-zero
AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.
Stars: ✭ 68 (+240%)
Mutual labels:  tic-tac-toe
tictacNET
Solving Tic-Tac-Toe with Neural Networks.
Stars: ✭ 17 (-15%)
Mutual labels:  tic-tac-toe
tic-tac-toe
Tic Tac Toe game implementation in Elm
Stars: ✭ 21 (+5%)
Mutual labels:  tic-tac-toe
tictactoe-ai-tfjs
Train your own TensorFlow.js Tic Tac Toe
Stars: ✭ 45 (+125%)
Mutual labels:  tic-tac-toe
tic-tac-toe-js
No description or website provided.
Stars: ✭ 20 (+0%)
Mutual labels:  tic-tac-toe

The Ultimate Tic Tac Toe Player Bot - with Reinforcement Learning

Reinforcement Learning based Ultimate Tic Tac Toe player

ultimate tic tac toe image

Background

For more details on the game of Ultimate Tic Tac Toe and why I started this project, refer to my blog article

This project is meant for others to test their learning algorithms on an existing infrastructure for the Ultimate Tic Tac Toe game. This project has two implemented reinforcement learning algorithms, a reinforcement learning bot (which can use any provided learning algorithm of your choice), and a random bot (that pick moves at random) and they are good for testing against one another for benchmarking performance.

Credit to this blog post for helping me understand the rules of the game with a lot of whiteboard drawings.

Board

To instantiate and play a game of ultimate tic tac toe:

from ultimateboard import UTTTBoard
from board import GridStates
b = UTTTBoard()
b.makeMove(GridStates.PLAYER_X, (1,1), (1,1))
b.makeMove(GridStates.PLAYER_O, b.getNextBoardLocation(), (1, 2))
b.makeMove(GridStates.PLAYER_X, b.getNextBoardLocation(), (1, 1))

The co-ordinate system is shown below, and is the same for the master board, as well as any tile within it: ultimate tic tac toe image

E.g. co-ordinates of (1,1), (1,1) as in the first move above represents the center square of the center tile.

To view the state of the board at any given time (you'll get a console output):

b.printBoard()

Players

There are two implemented bots for playing the game

  1. RandomUTTTPlayer who makes moves at random
  2. RLUTTTPlayer who makes moves based on a user-supplied learning algorithm

To play the game with one or a combination of these bots, use the SingleGame class. E.g. with two random players

from game import SingleGame
from ultimateplayer import RandomUTTTPlayer
from ultimateboard import UTTTBoard, UTTTBoardDecision

player1, player2 = RandomUTTTPlayer(), RandomUTTTPlayer()
game = SingleGame(player1, player2, UTTTBoard, UTTTBoardDecision)
result = game.playAGame()

When using the RL player, it will need to be initialized with a learning algorithm of your choice. I've already provided two sample learning algorithms: TableLearning and NNUltimateLearning

from game import SingleGame
from learning import TableLearning
from ultimateplayer import RandomUTTTPlayer, RLUTTTPlayer
from ultimateboard import UTTTBoard, UTTTBoardDecision

player1, player2 = RLUTTTPlayer(TableLearning(UTTTBoardDecision)), RandomUTTTPlayer() 
game = SingleGame(player1, player2, UTTTBoard, UTTTBoardDecision)
result = game.playAGame()

Learning Algorithm

The reinforcement learning (RL) player uses a learning algorithm to improve its chances of winning as it plays a number of games and learns about different positions. The learning algorithm is the key piece to the puzzle for making the RL bot improve its chances of winning over time. There is a generic template provided for the learning algorithm:

class GenericLearning(object):
    def getBoardStateValue(self, player, board, boardState):
        # Return the perceived `value` of a given board state
        raise NotImplementedError

    def learnFromMove(self, player, board, prevBoardState):
        # Learn from the previous board state and the current state of the board
        raise NotImplementedError
        
    def resetForNewGame(self):
        # Optional to implement. Reinitialize some form of state for each new game played
        pass
        
    def gameOver(self):
        # Option to implement. When a game is completed, run some sort of learning e.g. train a neural network
        pass

Any learning model must inherit from this class and implement the above methods. For examples see TableLearning for a lookup table based solution, and NNUltimateLearning for a neural network based solution. Every board state is an 81-character string which represents a raster scan of the entire 9x9 board (row-wise). You can map this to numeric entries as necessary.

Using your own learning algorithm

Simply implement your learning model e.g. MyLearningModel by inheriting from GenericLearning. Then instantiate the provided reinforcement learning bot with an instance of this model:

from ultimateboard import UTTTBoardDecision
from learning import GenericLearning
import random
from ultimateplayer import RLUTTTPlayer

class MyLearningModel(GenericLearning):
   def getBoardStateValue(self, player, board, boardState):
       # Your implementation here
       value = random.uniform() # As an example (and a very poor one)
       return value   # Must be a numeric value
   
   def learnFromMove(self, player, board, prevBoardState):
       # Your implementation here - learn some value for the previousBoardState
       pass

learningModel = MyLearningModel(UTTTBoardDecision)
learningPlayer = RLUTTTPlayer(learningModel)

Sequence of games

More often than not you will want to just play a sequence of games and observe the learning over time. Code samples for that have been provided and uses the GameSequence class

from ultimateplayer import RLUTTTPlayer, RandomUTTTPlayer
from game import GameSequence
from ultimateboard import UTTTBoard, UTTTBoardDecision

learningPlayer = RLUTTTPlayer()
randomPlayer = RandomUTTTPlayer()
results = []
numberOfSetsOfGames = 40
for i in range(numberOfSetsOfGames):
    games = GameSequence(100, learningPlayer, randomPlayer, BoardClass=UTTTBoard, BoardDecisionClass=UTTTBoardDecision)
    results.append(games.playGamesAndGetWinPercent())

Prerequisites

You will need to have numpy installed to work with this code. If using the neural network based learner in the examples provided, you will also need to have keras installed. This will require one of Tensorflow, Theano or CNTK.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].