All Projects → OMerkel → UCThello

OMerkel / UCThello

Licence: other
UCThello - a board game demonstrator (Othello variant) with computer AI using Monte Carlo Tree Search (MCTS) with UCB (Upper Confidence Bounds) applied to trees (UCT in short)

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects
CSS
56736 projects

Projects that are alternatives of or similar to UCThello

Alpha Zero General
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Stars: ✭ 2,617 (+9965.38%)
Mutual labels:  mcts, othello, monte-carlo-tree-search
Alphazero gomoku
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
Stars: ✭ 2,570 (+9784.62%)
Mutual labels:  board-game, mcts, monte-carlo-tree-search
Deep RL with pytorch
A pytorch tutorial for DRL(Deep Reinforcement Learning)
Stars: ✭ 160 (+515.38%)
Mutual labels:  mcts, uct
alpha-zero
AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.
Stars: ✭ 68 (+161.54%)
Mutual labels:  mcts, othello
ionic4-phaser3-template
Ionic 4 and phaser 3 template
Stars: ✭ 19 (-26.92%)
Mutual labels:  mobile-app, mobile-game
MCTS-agent-python
Monte Carlo Tree Search (MCTS) is a method for finding optimal decisions in a given domain by taking random samples in the decision space and building a search tree accordingly. It has already had a profound impact on Artificial Intelligence (AI) approaches for domains that can be represented as trees of sequential decisions, particularly games …
Stars: ✭ 22 (-15.38%)
Mutual labels:  mcts, monte-carlo-tree-search
quoridor-ai
Quoridor AI based on Monte Carlo tree search
Stars: ✭ 23 (-11.54%)
Mutual labels:  mcts, monte-carlo-tree-search
GalaxyBreak
Galaxy Break is a minimalistic endless video game for mobile published for Android platform.
Stars: ✭ 21 (-19.23%)
Mutual labels:  mobile-app, mobile-game
QuizChallenge
Multiplayer Online Quiz
Stars: ✭ 19 (-26.92%)
Mutual labels:  mobile-app, mobile-game
AnimalChess
Animal Fight Chess Game(斗兽棋) written in rust.
Stars: ✭ 76 (+192.31%)
Mutual labels:  board-game, monte-carlo-tree-search
captAR
Augmented Reality Geolocation Capture-the-Flag Mobile Game Capstone Project
Stars: ✭ 24 (-7.69%)
Mutual labels:  mobile-app, mobile-game
Hypernets
A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
Stars: ✭ 221 (+750%)
Mutual labels:  mcts, monte-carlo-tree-search
godpaper
🐵 An AI chess-board-game framework(by many programming languages) implementations.
Stars: ✭ 40 (+53.85%)
Mutual labels:  board-game, mcts
covid19cuba-app
Mobile application of Covid19 Cuba Data project implemented with Flutter
Stars: ✭ 41 (+57.69%)
Mutual labels:  mobile-app
react-native-movies-app
Movies catalog app written in react native and use of themoviedb api
Stars: ✭ 88 (+238.46%)
Mutual labels:  mobile-app
mangosta-android
MongooseIM client for Android
Stars: ✭ 31 (+19.23%)
Mutual labels:  mobile-app
nomdebebe
A simple, private tool to help pick a baby name.
Stars: ✭ 254 (+876.92%)
Mutual labels:  mobile-app
react-native-football
React Native Premier League Football App ⚽ 👟🏆🏅
Stars: ✭ 61 (+134.62%)
Mutual labels:  mobile-app
Android-daily-read-tips
log for articles and info in android for every developer
Stars: ✭ 13 (-50%)
Mutual labels:  mobile-app
react-native-single-select
Customizable & Easy to Use Single Select Library for React Native
Stars: ✭ 74 (+184.62%)
Mutual labels:  mobile-app

UCThello icon UCThello

UCThello - a board game demonstrator with computer AI using Monte-Carlo Tree Search (MCTS) with UCB (Upper Confidence Bounds) applied to trees (UCT in short)

Keywords, Categories Monte-Carlo Tree Search (MCTS), Upper Confidence Bounds (UCB), UCB applied to trees (UCT), AI, 2-player board game, deterministic game with perfect information, JavaScript, ECMAScript, W3C WebWorker

Abstract

UCThello is a board game using Monte-Carlo Tree Search (MCTS) with UCB (Upper Confidence Bounds) applied to trees (UCT in short) for the computer player AI. The board game used for demonstration purposes of the UCT algorithm is close to a game named Othello depending on selected options. In fact it can be played depending on your configuration following the official tournament rules of the WOF - World Othello Federation - if intended [WOF14]. Other rule settings to play variants are available, too. Per design decision the playing strength is limited for pleasure and fun level. Thus it is kept at a moderate to quite strong level on purpose due to the target environment, device platform, and audience expectations. This is done e.g. by limitation of the maximum AI response time and using a single execution thread for AI only plus just a second independent execution thread for a responsive user interface to avoid battery drains if full CPU and GPU core support would be implemented leading to bad user experience. Other possible but at least currently postponed improvements could be done by simple usage of a well-known and available game opening book. Although such simple modifications could improve the playing strength these features are not implemented in the current version yet.

Othello is a derivative of the board game Reversi which can be played by UCThello as well. Reversi is claimed to be invented by either Lewis Waterman or John W. Mollett. Predecessor of Reversi created by Mollett is The game of Annexation, also called Annex back in 19th century.

Monte-Carlo Tree Search

The Monte-Carlo Tree Search (MCTS in short) represents an algorithms used to build a Search Tree interatively by successively adding nodes according to traversing of nodes and simulations in the problem domain. If the problem domain is a game then the nodes can represent moves according to the game rules. Traversing nodes follows a Selection Strategy. Simulations are often called playouts, too. The different nodes inside the simulated paths get statistics reflecting ratios of win and loss related to total amount of simulations. Assumption is that with higher total amount of simulations the confidence in the statistics gets high enough and allows to select quality nodes or moves. Such that the idea is to retrieve the acceptable next node or move with optimal ratio then.

The iterative MCTS algorithm is modelled to perform four main states typically called

  • Selection,
  • Expansion,
  • Simulation, and
  • Backpropagation. See [Cha10] & [CBSS08]

In UCThello the related code fragment for this loop is close to

Uct.prototype.getActionInfo = function ( board, maxIterations, verbose ) {
  var root = new UctNode(null, board, null);
  for(var iterations=0; iterations<maxIterations; iterations+=1) {
    var node = root;
    var variantBoard = board.copy();
    /* Selection */
    ...
    /* Expansion */
    ...
    /* Simulation */
    ...
    /* Backpropagation */
    ...
  }
  return { action : root.mostVisitedChild().action };
};

On a given board the most visited and therefore best information describing an action according to the rules performed by the current player shall be determined.

Selection

First step or state in an MCTS algorithm iteration is the Selection. Objective of the Selection is to retrieve a path beginning at the root node towards a selected leaf node from the search tree. The Search Tree stays fixed inside the Selection state. It grows in a later state of the algorithm by appending more nodes on each iteration of the MCTS. Only exception is when a selected path has a final leaf node that is a terminal node. A terminal node simply is a move representation of an end of game situation according to the rules. The root node represents the current game or problem domain situation. To traverse the search tree from the root node towards the leaf nodes simply means to follow a possible predicted variant of game play.

The objective of the Selection Strategy is to branch the intended search path in a balance of information exploration and exploitation. If a branch is selected following a search path branch already examined previously this is seen as an exploit. An exploit shall confirm the quality of an already examined node in terms of gaining higher statistical confidence. Higher statistical confidence does mean to have more reliable estimates. Exploration is performed by creating new nodes in later MCTS steps or alternatively search path branch selection of relatively rare traversed nodes. Nodes traversed in a low amount simply reflects a low reliability or statistical confidence. The border between exploit and explore is often seen as being soft and fluent.

Thus a selection of a child node to traverse next at each level of the already build search tree path is usually based on a quality value of the visited nodes in earlier iterations. An optimal Selection Strategy to best support the objective is unknown. One statistical approach called Upper Confidence Bounds (UCB) algorithm uses a logarithm based formula on collected quality values correlated to the nodes on the search path if applied to MCTS. The combination of MCTS and UCB called UCT (short for UCB applied to trees) is credited to [KS06]. Other approaches or additional supporting ideas for a Selection Strategy are presented and discussed e.g. in [CSUB06].

Besides the Selection Strategy in search path branch Selection an additional aspect is seen. To avoid a risk that any high quality node is unvisited that is located near the rood node already. To reach such a design goal a possible solution is to favor traversing any unexplored child node over following explored siblings. Widening the search tree is then favored over deepening. Critics could be that randomness of Monte-Carlo methods is reduced if applied.

In UCThello the select child step implements the UCT algorithm. The UCB related code is part of the UctNode.prototype.selectChild function.

Additionally UCThello implements to favor early Selection of a traversed node on any unexplored (or unexamined) child existing. Such an unexplored (or unexamined) child is preferred over continuing traversing any explored node.

var node = root;
var variantBoard = board.copy();
/* Selection */
while (node.unexamined.length == 0 && node.children.length > 0) {
  node = node.selectChild();
  variantBoard.doAction(node.action);
}

Expansion

The objective of the Expansion step is to add a new unexplored child of the node determined by the previous Selection.

If the node determined by the Selection is an inner node instead of a leaf node then this node has a combination of explored and unexplored children. Either way an unexplored child shall be added for the coming Simulation state. Only exception is that a leaf node has been reached representing a terminal node. In such a case no Expansion and Simulation is needed since a terminal node means that a end of game is implied at that node on the search path.

Sometimes you will find implementations where multiple Expansions take place on the Selection node. This simply means a set of child nodes is added at once then.

In UCThello exactly one node will be added unless a terminal node is reached and the list of remaining unexplored child nodes is determined before. To avoid any preferred order when getting a node from the set of remaining nodes or when a dependency from any parameter or state exists the returned node is selected randomly.

/* Selection */
...
/* Expansion */
if (node.unexamined.length > 0) {
  var j = Math.floor(Math.random() * node.unexamined.length);
  variantBoard.doAction(node.unexamined[j]);
  node = node.addChild(variantBoard, j);
}

Terminal nodes do not have any child nodes. So it is sufficient to check for the unexamined.length in case a terminal node has been selected.

Simulation

Now the objective of a Simulation is to playout a possible scenario starting from the newly expanded search tree leaf node. Simulation is performed until end of game is reached.

Mind the playout does not modify the expanded search tree leaf node. The fixed leaf node - respectively the correlated game state - is used as the base for the simulation only.

On each simulation step a player's action valid by the rules is performed on the created variant board. The variant board is used as a complete copy of the current board and game state. This is to avoid changes to the board and game state while following the full search path and simulation steps.

Instead of doing just a single playout alternatively several playouts could be started from the selected and expanded search tree leaf node. Idea behind this would be to save the run time needed for a possible choice of the same selection path in later iterations.

In UCThello a single playout is performed per iteration. The number of MCTS algorithm iterations equals the number of simulations then.

var variantBoard = board.copy();
/* Selection */
...
/* Expansion */
...
/* Simulation */
var actions = variantBoard.getActions();
while(actions.length > 0) {
  variantBoard.doAction(actions[Math.floor(Math.random() * actions.length)]);
  ...
  actions = variantBoard.getActions();
}

Backpropagation

Objective of the Backpropagation is to update the statistics of all nodes along the search tree path in reverse order until the root node is reached. The Simulation did not perform any changes on the search tree path. Since the search tree path is unchanged this means the eventually played or predicted result on the playout can be used to update statistics starting at the search tree path leaf node via the parent nodes until the root node is reached.

var node = root;
var variantBoard = board.copy();
/* Selection */
...
/* Expansion */
...
/* Simulation */
...
/* Backpropagation */
var result = variantBoard.getResult();
while(node) {
  node.update(result);
  node = node.parentNode;
}

In UCThello the UCT AI player does not maximize for the amount of discs of own color on board. Instead it analyzes the end of game situation just for any result being a win. The call variantBoard.getResult() returns an array of length two. The two values returned stand for the game result of the players in terms of win or loss. The winning player gets a full point while his opponent scores zero points. Meaning the result is either [ 1, 0 ] or [ 0, 1 ]. A draw or stalemate situation is represented as an array [ 0.5, 0.5 ]. Meaning a draw is better than a loss but shall be interpreted as half a win for both players.

The statistics for a node is updated by node.update(result). Mind the search tree node is representing a move of the active player according to the rules. The update picks the active player's end of game result from the result array and adds it to a statistics value representing the total amount of wins found traversing the search tree node over all MCTS iterations. Additionally the amount of visits for the node is increased.

References

3rd Party Libraries

Links

Othello Organizations

Mind that UCThello follows (most) official tournament rules of the listed organizations depending on your selected options. Still UCThello is independent development from any work of these organizations.

Contributors / Authors

Oliver Merkel,
Creative Commons License
This image is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Oliver Merkel, Creative Commons License, This image is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All logos, brands and trademarks mentioned belong to their respective owners.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].