All Projects → hrpan → Tetris_mcts

hrpan / Tetris_mcts

MCTS project for Tetris

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tetris mcts

Gym Alttp Gridworld
A gym environment for Stuart Armstrong's model of a treacherous turn.
Stars: ✭ 14 (-95.38%)
Mutual labels:  game, reinforcement-learning
Aigames
use AI to play some games.
Stars: ✭ 422 (+39.27%)
Mutual labels:  game, reinforcement-learning
Pygame Learning Environment
PyGame Learning Environment (PLE) -- Reinforcement Learning Environment in Python.
Stars: ✭ 828 (+173.27%)
Mutual labels:  game, reinforcement-learning
Rlcard
Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
Stars: ✭ 980 (+223.43%)
Mutual labels:  game, reinforcement-learning
Awesome Deep Rl
For deep RL and the future of AI.
Stars: ✭ 985 (+225.08%)
Mutual labels:  game, reinforcement-learning
Snake Ai Reinforcement
AI for Snake game trained from pixels using Deep Reinforcement Learning (DQN).
Stars: ✭ 123 (-59.41%)
Mutual labels:  game, reinforcement-learning
Snake
Artificial intelligence for the Snake game.
Stars: ✭ 1,241 (+309.57%)
Mutual labels:  game, reinforcement-learning
Ml Games
Machine learning games. Use combination of genetic algorithms and neural networks to control the behaviour of in-game objects.
Stars: ✭ 247 (-18.48%)
Mutual labels:  game, reinforcement-learning
Terasology
Terasology - open source voxel world
Stars: ✭ 3,247 (+971.62%)
Mutual labels:  game
Glportal
🎮 Open Source teleportation based first person puzzle-platformer
Stars: ✭ 297 (-1.98%)
Mutual labels:  game
Rakugo Archive
Framework (inspired by Ren'Py) for story driven games in Godot.
Stars: ✭ 291 (-3.96%)
Mutual labels:  game
Open Project 1
Unity Open Project #1: Chop Chop
Stars: ✭ 4,235 (+1297.69%)
Mutual labels:  game
Tensorwatch
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Stars: ✭ 3,191 (+953.14%)
Mutual labels:  reinforcement-learning
Dreamerv2
Mastering Atari with Discrete World Models
Stars: ✭ 287 (-5.28%)
Mutual labels:  reinforcement-learning
Bounce
Bounce is a 3D physics engine for games.
Stars: ✭ 300 (-0.99%)
Mutual labels:  game
Deep rl
PyTorch implementations of Deep Reinforcement Learning algorithms (DQN, DDQN, A2C, VPG, TRPO, PPO, DDPG, TD3, SAC, SAC-AEA)
Stars: ✭ 291 (-3.96%)
Mutual labels:  reinforcement-learning
Snake Go
Snake game made in Go! 🐍
Stars: ✭ 292 (-3.63%)
Mutual labels:  game
Voxelengine3
Voxel-engine in Javascript
Stars: ✭ 302 (-0.33%)
Mutual labels:  game
Neuron poker
Texas holdem OpenAi gym poker environment with reinforcement learning based on keras-rl. Includes virtual rendering and montecarlo for equity calculation.
Stars: ✭ 299 (-1.32%)
Mutual labels:  reinforcement-learning
Yarneditor
A tool for writing interactive dialogue in games!
Stars: ✭ 292 (-3.63%)
Mutual labels:  game

Learning to play Tetris with Monte Carlo Tree Search and Temporal Difference Learning

My personal project for the love of Tetris.

(Warning: Codes are a hot mess riddled with inconsistent styles and unclear namings, read them at your own risk.)

(Warning2: I have been experimenting with a lot of things lately, so some of the features may not work properly. If you intend to reproduce my earlier results, make sure you check out my previous commits.)

Introduction

This project started out as a practice to apply Deep Q-Learning to Tetris, one of my favourite puzzle games of all time. However, I soon realized that it was almost impossible to train an agent to perform anywhere near human level possibly due to the sparsity and long-term dependency of the rewards in Tetris (imagine how many actions you need to perform to clear even one line!). It was also around that time AlphaGo beat Lee Sedol in a dominating fashion that reignited my hopes for a better agent. Also, I believed that a model based approach should improve significantly compared to model free approaches (Q learning, policy gradients etc.). So here it is, the MCTS-TD agent inspired by AlphaGo specializing in the game Tetris.

How is this related to AlphaGO/Zero?

At the core of AlphaGo, the agent tries to search the game tree base on upper confidence bound applied to trees (UCT). Unlike vanilla MCTS which has to simulate the entire game to estimate the value of the current state, AlphaGo uses a neural network to inference the value (winning probability) and the policy (likely next moves) of the current state to calculate the upper confidence bound for each move. In my agent, I used exponential moving averages and variances with initial values from the neural network to calculate the upper confidence bound based on central limit theorem which I believe is more appropriate for single player games with unbounded rewards. Another difference is that AlphaGo uses the final scores of each game as the training targets while this agent uses a bootstrapped target, hence Temporal Difference Learning.

How is this different from other Tetris Bots?

Most of the super-human performing Tetris bots seen on youtube or other games use heuristics (number of holes, height of each column, smoothness of the surface etc.) to model the reward. Using heuristics can substantially simplify the problem since the rewards are now much denser (you get a reward for each piece you drop or each action you did) and are highly correlated with the final score. However, such handcrafted rewards can bias your agents toward the target you set (e.g. minimize holes in the board or height of the column) instead of the true target (clearing lines). Furthermore, such heuristics do not generalize beyond the game Tetris meaning that you have to handcraft rewards for each game you want your bot to play. This agent differs from those bots in the sense that it can be applied to any environment satisfying certain requirements.

Prerequisite

See requirements.txt

You'll also need the Tetris environment from here, and the Python/C++ binding library pybind11.

Training your agent

You can now train your agent with a single command:

python play.py --agent_type ValueSimLP --online --ngames 1000 --mcts_sims 100

Results

Version 1.0

See the agent in action here!

The agent was trained with 300 simulations per move (SPM) and each training iteration consisted of 100 games. In each iteration, a benchmark game with 1500 SPM was also played to assert the agent's performance.

Averaged and standard deviation of line clears from training/benchmark games. (Left: training; Right: benchmark.)

Version 2.0

See the evolution of the agent here!

The reward function was replaced by the one used in the official guideline. Training games were played with 500 SPM, and benchmark games were played with 1000 SPM. Instead of training only on the actual visited states, states travelled during MCTS were also used as training data.

Achieved 5678 line clears with 1000 SPM after 750 training episodes. Had to terminate because I ran out of RAM.

Average and standard deviation of scores and line clears at each iteration. (Left: training; Right: benchmark).

Version 3.0

???

Further Readings

  1. Reinforcement Learning: An Introduction (Must read for anyone interested in reinforcement learning)
  2. Playing Atari with Deep Reinforcement Learning (First deep reinforcement learning)
  3. A Survey of Monte Carlo Tree Search Methods (Great review of MCTS)
  4. Transpositions and Move Groups in Monte Carlo Tree Search (An important branch reduction technique for MCTS)
  5. Bandit Algorithm (Contains almost everything you need to know about bandit-like algorithms)
  6. Deep Learning (Great introductory book on deep learning)
  7. Mastering the game of Go without human knowledge (Original AlphaGo Zero paper)
  8. Deep Reinforcement Learning: Pong from Pixels (Great introductory article on deep reinforcement learning)
  9. The Game of Tetris in Machine Learning (Great introduction to the history of Tetris in machine learning)
  10. Playing Tetris with Deep Reinforcement Learning (A short report on DQN applied to Tetris)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].