All Projects → gorisanson → quoridor-ai

gorisanson / quoridor-ai

Licence: MIT License
Quoridor AI based on Monte Carlo tree search

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects
CSS
56736 projects

Projects that are alternatives of or similar to quoridor-ai

MCTS-agent-python
Monte Carlo Tree Search (MCTS) is a method for finding optimal decisions in a given domain by taking random samples in the decision space and building a search tree accordingly. It has already had a profound impact on Artificial Intelligence (AI) approaches for domains that can be represented as trees of sequential decisions, particularly games …
Stars: ✭ 22 (-4.35%)
Mutual labels:  mcts, monte-carlo-tree-search
UCThello
UCThello - a board game demonstrator (Othello variant) with computer AI using Monte Carlo Tree Search (MCTS) with UCB (Upper Confidence Bounds) applied to trees (UCT in short)
Stars: ✭ 26 (+13.04%)
Mutual labels:  mcts, monte-carlo-tree-search
Hypernets
A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
Stars: ✭ 221 (+860.87%)
Mutual labels:  mcts, monte-carlo-tree-search
Alphazero gomoku
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
Stars: ✭ 2,570 (+11073.91%)
Mutual labels:  mcts, monte-carlo-tree-search
Alpha Zero General
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Stars: ✭ 2,617 (+11278.26%)
Mutual labels:  mcts, monte-carlo-tree-search
alpha-zero
AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.
Stars: ✭ 68 (+195.65%)
Mutual labels:  mcts
AlphaZero Gobang
Deep Learning big homework of UCAS
Stars: ✭ 29 (+26.09%)
Mutual labels:  mcts
Deep RL with pytorch
A pytorch tutorial for DRL(Deep Reinforcement Learning)
Stars: ✭ 160 (+595.65%)
Mutual labels:  mcts
deep-active-inference-mc
Deep active inference agents using Monte-Carlo methods
Stars: ✭ 41 (+78.26%)
Mutual labels:  monte-carlo-tree-search
AnimalChess
Animal Fight Chess Game(斗兽棋) written in rust.
Stars: ✭ 76 (+230.43%)
Mutual labels:  monte-carlo-tree-search
connect4
Solving board games like Connect4 using Deep Reinforcement Learning
Stars: ✭ 33 (+43.48%)
Mutual labels:  monte-carlo-tree-search
godpaper
🐵 An AI chess-board-game framework(by many programming languages) implementations.
Stars: ✭ 40 (+73.91%)
Mutual labels:  mcts
muzero
A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.
Stars: ✭ 126 (+447.83%)
Mutual labels:  mcts
ludorum.js
A board game framework, focused not on graphics or user interfaces, but on artificial players design, implementation and testing.
Stars: ✭ 13 (-43.48%)
Mutual labels:  monte-carlo-tree-search
alpha sigma
A pytorch based Gomoku game model. Alpha Zero algorithm based reinforcement Learning and Monte Carlo Tree Search model.
Stars: ✭ 134 (+482.61%)
Mutual labels:  monte-carlo-tree-search
pyDLGO
基於深度學習的 GTP 圍棋(围棋)引擎,KGS 指引文件以及演算法教學。
Stars: ✭ 33 (+43.48%)
Mutual labels:  mcts
jupiter
A Monte-Carlo based AI to beat 2048
Stars: ✭ 47 (+104.35%)
Mutual labels:  mcts
alphastone
Using self-play, MCTS, and a deep neural network to create a hearthstone ai player
Stars: ✭ 24 (+4.35%)
Mutual labels:  monte-carlo-tree-search
MyAlphaGoZeroOnConnect4
My Simple Implementation of AlphaGo Zero on Connect4
Stars: ✭ 16 (-30.43%)
Mutual labels:  monte-carlo-tree-search

Quoridor AI based on Monte Carlo Tree Search

Quoridor is an abstract strategy game played on a board of 81 (9x9) square spaces where the objective is to get your pawn to the opposite side of the board.

This AI agent playing Quoridor is based on Monte Carlo tree search (MCTS). Pure MCTS resulted in a poor performance. The performance was significantly improved after applying some heuristics. I added heuristics to the selection, expansion and simulation phase of the tree search (and also to post processes after search). You can see some of them on the "Some Heuristics Included" section below. If you want to see all the heuristics or their implementation details, refer to the comments in the source code. (Find the word "heuristic".)

You can play against this AI on the website (or web app) https://gorisanson.github.io/quoridor-ai/.

Play-Quoridor-against-AI-logo

The number of rollouts per move for each AI level on the website is following.

Level Rollouts per Move
Novice 2,500
Average 7,500
Good 20,000
Strong 60,000

Some Heuristics Included in the latest version (v0.3)

  • The branching factor of Quoridor is high due to a bunch of possible positions to place a wall. (There are 64 (8x8) possible positions if there is no wall on the board.) So I used a heuristic which only considers "probable" next walls to lower the branching factor. I used this heuristic on the selection/expansion phase and the rollout phase of MCTS. "Probable" next walls are the followings.

    1. Near the pawns (to disturb the opponent's pawn or support the player's pawn)
    2. Near the already placed walls
    3. Leftmost or rightmost horizontal walls
  • On the rollout phase of MCTS, move the pawn to one of the shortest paths with a certain probability A. And with the rest probability 1-A, place a ("probable" next) wall randomly if there are walls left, otherwise, move the pawn backwards to give some penalty to having no left walls. (I set A = 0.7)

  • On the selection/expansion phase of MCTS, if the opponent has no walls left, the player moves their pawn only to one of the shortest paths or places a wall only to interrupt the opponent's path (not to support their pawn). This heuristic lowers the branching factor on the end phase of a game.

  • Some common openings are included. This heuristic is only effective less than 6 plies.

Previous Works

There are some previous works done by others. (For available previous agents, I matched my agent against them to compare. You can see the results on the "Result" section below.)

  • Victor Massagué Respall, Joseph Alexander Brown and Hamma Aslam. Monte Carlo Tree Search for Quoridor. 2018. - This paper led me to apply MCTS to my agent. Although I couldn't find the source code of the AI agent on this paper, just knowing the viability in advance was very encouraging to me.

  • Daniel Borowski's Quoridor AI - It's hard for me to figure out his AI algorithm from his source code. But I could find his comment on reddit which says "I "sort of" implemented minimax with a depth of ~2 (hehe)."

  • Martijn van Steenbergen's SmartBrain 4 - This agent uses negamax of depth 4 with some heuristics (In his implementation, there are also SmartBrain 1, SmartBrain 2 and SmartBrain 3, each of which uses negamax of depth 1, 2 and 3 respectively. But, obviously, SmartBrain 4 is the strongest.)

Results

The following tables is a comparison of my 60k agent (Strong) to other agent types. The agents from "2.5k agent (Novice)" to "180k agent" are all my AI agents with difference between them being the number of rollouts. For the 60k agent, half of the games were played as the light-colored pawn and the other half as the dark-colored pawn (assuming the light-colored pawn moves first). But against Dainel's Quoridor AI, the games were played as the light-colored pawn only, since his AI only takes the dark-colored pawn. Against Daniel's Quoridor AI and Martijin's SmartBrain 4, the matches are done manually by referring to the move from my 60k agent and playing it against them, and vice versa.

uctConst is the exploration parameter used in UCT(Upper Confidence Bound 1 applied to trees).

Result of 60k v0.2 with uctConst = 0.5

Opponent Number of games played (as the light-colored / as the dark-colored for 60k) Wins as the light-colored for 60k Wins as the dark-colored for 60k Percentage of Wins for 60k Lower Confidence Bound (95%) Upper Confidence Bound (95%)
2.5k agent (Novice) v0.2 100 (50/50) 46 38 84% 75.3% 90.6%
7.5k agent (Average) v0.2 100 (50/50) 39 36 75% 65.3% 83.1%
20k agent (Good) v0.2 100 (50/50) 31 33 64% 53.8% 73.4%
120k agent v0.2 100 (50/50) 26 25 51% 40.8% 61.1%
180k agent v0.2 100 (50/50) 23 20 43% 33.1% 53.3%
Daniel's Quoridor AI 20 (20/0) 15 0 75% 50.9% 91.3%
Martijn's SmartBrain 4 10 (5/5) 4 4 80% 44.4% 97.5%

When I played against them by myself, I thought Martijn's SmartBrain 4 was stronger than Daniel's Quoridor AI. But, interestingly, the 60k agent seems to play better against Martijin's SmartBrain 4. In some matches, the play of Daniel's Quoridor AI somehow made the 60k agent exhaust walls too quickly and lose the game.

Martijn's implementation of Quoridor allows the diagonal jump even if there is not a wall or the board edge behind the pawn to be jumped. (The original Quoridor rule allows the diagonal jump "only if" there is a wall or the board edge behind the opponent's pawn to be jumped.) In a match against Martijn's SmartBrain 4, the 60k agent won the match although this out-of-the-rule diagonal jump was played twice by SmartBrain 4. This match is included on the statistics of the table above. And in another match, the out-of-the-rule diagonal jump was played by SmartBrain 4 when there were no left walls for both players and the win or lose was to be decided by whether the jump is accepted or not. So I nullified this match.

Result of 60k v0.3 with uctConst = 0.2

Opponent Number of games played (as the light-colored / as the dark-colored for 60k) Wins as the light-colored for 60k Wins as the dark-colored for 60k Percentage of Wins for 60k Lower Confidence Bound (95%) Upper Confidence Bound (95%)
2.5k agent (Novice) v0.3 100 (50/50) 48 50 98% 93.0% 99.8%
7.5k agent (Average) v0.3 100 (50/50) 43 39 82% 73.1% 89.0%
20k agent (Good) v0.3 100 (50/50) 32 26 58% 47.7% 67.8%
60k agent (Strong) v0.2 with uctConst = 0.5 100 (50/50) 37 30 67% 56.9% 76.1%
Daniel's Quoridor AI 20 (20/0) 18 0 90% 68.3% 98.8%
Martijn's SmartBrain 4 4 (2/2) 2 2 100% 39.8% 100%

The 60k v0.3 agent won 67 out of 100 games against the 60k v0.2 agent.

In a match against Martijn's SmartBrain 4, the 60k agent won the match although the out-of-the-rule diagonal jump was played once by SmartBrain 4. This match is included on the statistics of the table above.

References

  • Victor Massagué Respall, Joseph Alexander Brown and Hamma Aslam. Monte Carlo Tree Search for Quoridor. 2018.

  • Levente Kocsis and Csaba Szepesva ́ri. Bandit based Monte-Carlo Planning. 2006.

  • Peter Auer, Cesa-Bianchi and Fischer. Finite-time Analysis of the Multiarmed Bandit Problem. 2002.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].