All Projects → kirarpit → connect4

kirarpit / connect4

Licence: MIT license
Solving board games like Connect4 using Deep Reinforcement Learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to connect4

MyAlphaGoZeroOnConnect4
My Simple Implementation of AlphaGo Zero on Connect4
Stars: ✭ 16 (-51.52%)
Mutual labels:  residual-networks, monte-carlo-tree-search, alphago-zero, connect4-game
Paddle-RLBooks
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Stars: ✭ 113 (+242.42%)
Mutual labels:  policy-gradient, double-dqn, dueling-dqn
Reinforcement Learning With Tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
Stars: ✭ 6,948 (+20954.55%)
Mutual labels:  policy-gradient, double-dqn, dueling-dqn
Alpha Zero General
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Stars: ✭ 2,617 (+7830.3%)
Mutual labels:  monte-carlo-tree-search, alphago-zero
distributed rl
Pytorch implementation of distributed deep reinforcement learning
Stars: ✭ 66 (+100%)
Mutual labels:  double-dqn, dueling-dqn
playing-mario-with-deep-reinforcement-learning
An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.
Stars: ✭ 55 (+66.67%)
Mutual labels:  double-dqn, dueling-dqn
Reinforcement Learning
Deep Reinforcement Learning Algorithms implemented with Tensorflow 2.3
Stars: ✭ 61 (+84.85%)
Mutual labels:  policy-gradient, reinforcement-learning-algorithms
Alphazero gomoku
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
Stars: ✭ 2,570 (+7687.88%)
Mutual labels:  monte-carlo-tree-search, alphago-zero
Tianshou
An elegant PyTorch deep reinforcement learning library.
Stars: ✭ 4,109 (+12351.52%)
Mutual labels:  policy-gradient, double-dqn
Deep-rl-mxnet
Mxnet implementation of Deep Reinforcement Learning papers, such as DQN, PG, DDPG, PPO
Stars: ✭ 26 (-21.21%)
Mutual labels:  policy-gradient, reinforcement-learning-algorithms
yarll
Combining deep learning and reinforcement learning.
Stars: ✭ 84 (+154.55%)
Mutual labels:  policy-gradient, reinforcement-learning-algorithms
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+572.73%)
Mutual labels:  policy-gradient, double-dqn
Master-Thesis
Deep Reinforcement Learning in Autonomous Driving: the A3C algorithm used to make a car learn to drive in TORCS; Python 3.5, Tensorflow, tensorboard, numpy, gym-torcs, ubuntu, latex
Stars: ✭ 33 (+0%)
Mutual labels:  reinforcement-learning-algorithms
alpha sigma
A pytorch based Gomoku game model. Alpha Zero algorithm based reinforcement Learning and Monte Carlo Tree Search model.
Stars: ✭ 134 (+306.06%)
Mutual labels:  monte-carlo-tree-search
deeprl-continuous-control
Learning Continuous Control in Deep Reinforcement Learning
Stars: ✭ 14 (-57.58%)
Mutual labels:  reinforcement-learning-algorithms
alphastone
Using self-play, MCTS, and a deep neural network to create a hearthstone ai player
Stars: ✭ 24 (-27.27%)
Mutual labels:  monte-carlo-tree-search
pytorch-rl
Pytorch Implementation of RL algorithms
Stars: ✭ 15 (-54.55%)
Mutual labels:  reinforcement-learning-algorithms
bandits
Comparison of bandit algorithms from the Reinforcement Learning bible.
Stars: ✭ 16 (-51.52%)
Mutual labels:  reinforcement-learning-algorithms
alphazero
Board Game Reinforcement Learning using AlphaZero method. including Makhos (Thai Checkers), Reversi, Connect Four, Tic-tac-toe game rules
Stars: ✭ 24 (-27.27%)
Mutual labels:  alphago-zero
DRL in CV
A course on Deep Reinforcement Learning in Computer Vision. Visit Website:
Stars: ✭ 59 (+78.79%)
Mutual labels:  policy-gradient

Solving board games like Connect4 using RL

This repository contains implementation of multiple Reinforcement Learning(RL) algorithms in which an Artificial Neural Net(ANN) is trained to play board games like Connect4 and TicTacToe. Another purpose of this repository is to gain a comprehensive intuition on how different RL algorithms work, their pros and cons, and the ways in which they differ from each other. Hence, to support experimentation via tweaking hyper-parameters, a general RL framework is implemented with highly customisable modules.

How it works

  • First, a game object is created with arguments like board_size.
  • Next, two player objects are created which will play against each other. Every player object has its default brain which could be overridden by creating a custom brain object and assigning it to the player. Brain object is the place where the algorithms reside.
  • Next, an environment object is created and two players along with the game objects are put in this environment by passing them as arguments.
  • Finally, this environment is run which runs the game, takes actions from the players, and sends them back next state of the game and rewards of their actions.

How to run

  • Git clone this repository.
  • Change directory to 'src' folder.
  • Add the src folder in PYTHONPATH environment variable.
    export PYTHONPATH=/path/to/src/folder:$PYTHONPATH
  • Run examples. For example to train a DDQN player learn TicTacToe while playing against a Minimax player, run the below command.
    python3 examples/T3-DDQNvsMinimax.py

Implemented Algorithms

Two main classes of Reinforcement Learning, i.e., Q-Network and Policy Gradient, plus the new self learning algorithm described in AlphaGo Zero1.

  • Deep Q-Network(DQN)2 with Prioritised Experience Replay, reward and error clipping, and a separate target network aka Double DQN. By changing the neural net architecture, even Duelling DQN3 can also be constructed. For more information on DQNs, I suggest this amazing series of blog posts. Few code snippets were directly taken from there.
  • Policy Gradient(PG), Asynchronous Advantageous Actor-Critic4 (A3C).
  • Asynchronous DQN.
  • Possible other variations5 can also be implemented by tweaking the code a bit.
  • AlphaGo Zero self learning algorithm implemented down to every detail as described in the methods section of the paper.

Features

  • A model could be trained by playing against a MiniMax player using any of the implemented RL algorithms.
  • A model could be trained by self-playing using the AlphaZero or DQN algorithms.
  • Convolutional and Residual Layers could simply be toggled with a flag.
  • Other similar board games like Gomoku could easily be implemented by extending the base class "Game".

Players

Every player must implement 'act', 'observe' and 'train' methods of the abstract base class "Player", as these methods would be called by the environment.

  • DQN
  • PG
  • AlphaZero
  • MiniMax players for Connect4 and TicTacToe with customisable board size. For Connect4 MiniMax agent on a different board size one must compile one of these repositories [1], [2] from source, run them in the background and query them live. Feel free to raise an issue in case you need help with that. For the regular board size, the code will hit an API server which runs the first repository code mentioned here.
  • Human player for playing against trained models.

Results

  • Below are some of the charts showing total wins for player 1(p1), player 2(p2) and draws per 100 games on the Y-axis vs total number of games on the X-axis.
  • Post training, all the models shown below for TicTacToe and Connect4 on a smaller board size of 4X5, were able to defeat a corresponding perfect playing Minimax player -- with 5% chance of making a random move each turn -- in 95% of the games.
  • For Connect4(regular board size, i.e., 6X7) these algorithms take significant amount of time to train. Parallel processing is not very helpful in python. A model trained for 5 days using AlphaZero algorithm produced strong amateur player.
AlphaZero Player self learning TicTacToe Async DQN Player vs Minimax on TicTacToe
AlphaZero Player self learning TicTacToe Async DQN Player vs Minimax on TicTacToe
Double DQN Player vs Minimax on TicTacToe Policy Gradient Player vs Minimax on TicTacToe
Double DQN Player vs Minimax on TicTacToe Policy Gradient Player vs Minimax on TicTacToe
Async DQN Player vs Minimax on Connect4(Board size 4X5) Double DQN Player vs Minimax on Connect4(Board size 4X5)
Async DQN Player vs Minimax on Connect4(Board size 4X5) Double DQN Player vs Minimax on Connect4(Board size 4X5)
Policy Gradient Player vs Minimax on Connect4(Board size 4X5) AlphaZero Player self learning Connect4(Board size 6X7)
Policy Gradient Player vs Minimax on Connect4(Board size 4X5) AlphaZero Player self learning Connect4(Board size 6X7)

Observations

  • Exploration plays a very crucial role in learning. I highly recommend reading more on incentivising exploration6. Although, due to time constraint I was not able to implement it but it seems promising.
  • Getting rewards time to time is crucial for learning. Sparse rewards in such board games where the model receives a reward only at the end of the game, makes learning a lot more difficult and time consuming. One of the remedies is the N_STEP solution. Check PG and Q Players for more info.
  • Convolutional networks do perform better because they exploit the spatial representation of the pixels and hence can identify patterns easily.
  • DQN can learn from others experiences as well since it can learn offline compared to PG which only learns online. However, PG gives better results since it optimises directly on the policy.
  • Asynchronous DQN is most data efficient since a lot more training is going on with every step.
  • Asynchronous algorithms work a lot better with workers having different exploration rates since it de-correlates the agent's data and explores the environment faster.
  • AlphaZero is definitely a much superior algorithm and generates much better models which outperform other models generated by other mentioned Algorithms. However, it's less general and depends on perfect information assumption.

Conclusion

This turned out to be a good summer project which gave me a lot of insight into how deep RL algorithms work. I hope newcomers don't have to spend too much time on understanding, implementing and debugging these algorithms by taking advantage of this repository. Feel free to fork, create an issue if there is something you don't understand or make a pull request!

References

1: "Mastering the game of Go without human knowledge".
2: "Playing Atari with Deep Reinforcement Learning".
3: "Dueling Network Architectures for Deep Reinforcement Learning".
4: "Asynchronous Methods for Deep Reinforcement Learning".
5: "Rainbow: Combining Improvements in Deep Reinforcement Learning".
6: "Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models".

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].