Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → nuno-faria → Tetris Ai

nuno-faria / Tetris Ai

A deep reinforcement learning bot that plays tetris

Programming Languages

139335 projects - #7 most used programming language

Labels

deep-reinforcement-learning q-learning

Projects that are alternatives of or similar to Tetris Ai

Deep Reinforcement Learning Pong Agent, King Pong, he's the best

Stars: ✭ 23 (-78.9%)

Mutual labels: deep-reinforcement-learning, q-learning

Explorer is a PyTorch reinforcement learning framework for exploring new ideas.

Stars: ✭ 54 (-50.46%)

Mutual labels: deep-reinforcement-learning, q-learning

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (+103.67%)

Mutual labels: deep-reinforcement-learning, q-learning

2048 Deep Reinforcement Learning

Trained A Convolutional Neural Network To Play 2048 using Deep-Reinforcement Learning

Stars: ✭ 169 (+55.05%)

Mutual labels: deep-reinforcement-learning, q-learning

Dissecting Reinforcement Learning

Python code, PDFs and resources for the series of posts on Reinforcement Learning which I published on my personal blog

Stars: ✭ 512 (+369.72%)

Mutual labels: deep-reinforcement-learning, q-learning

Deep Rl Trading

playing idealized trading games with deep reinforcement learning

Stars: ✭ 228 (+109.17%)

Mutual labels: deep-reinforcement-learning, q-learning

Reinforcement Learning Demos

Stars: ✭ 66 (-39.45%)

Mutual labels: deep-reinforcement-learning, q-learning

A course on Deep Reinforcement Learning in Computer Vision. Visit Website:

Stars: ✭ 59 (-45.87%)

Mutual labels: deep-reinforcement-learning, q-learning

DEEp Reinforcement learning framework

Stars: ✭ 455 (+317.43%)

Mutual labels: deep-reinforcement-learning, q-learning

Reinforcement learning tutorial with demo

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

Stars: ✭ 442 (+305.5%)

Mutual labels: deep-reinforcement-learning, q-learning

Accel Brain Code

The purpose of this repository is to make prototypes as case study in the context of proof of concept(PoC) and research and development(R&D) that I have written in my website. The main research topics are Auto-Encoders in relation to the representation learning, the statistical machine learning for energy-based models, adversarial generation networks(GANs), Deep Reinforcement Learning such as Deep Q-Networks, semi-supervised learning, and neural network language model for natural language processing.

Stars: ✭ 166 (+52.29%)

Mutual labels: deep-reinforcement-learning, q-learning

Playing Atari games with TensorFlow implementation of Asynchronous Deep Q-Learning

Stars: ✭ 44 (-59.63%)

Mutual labels: deep-reinforcement-learning, q-learning

Deep Qlearning Agent For Traffic Signal Control

A framework where a deep Q-Learning Reinforcement Learning agent tries to choose the correct traffic light phase at an intersection to maximize traffic efficiency.

Stars: ✭ 136 (+24.77%)

Mutual labels: deep-reinforcement-learning, q-learning

Deep & Classical Reinforcement Learning + Machine Learning Examples in Python

Stars: ✭ 241 (+121.1%)

Mutual labels: deep-reinforcement-learning, q-learning

Play Google Chrome's T-rex game with TensorFlow

Stars: ✭ 345 (+216.51%)

Mutual labels: deep-reinforcement-learning, q-learning

Hands On Reinforcement Learning With Python

Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow

Stars: ✭ 640 (+487.16%)

Mutual labels: deep-reinforcement-learning, q-learning

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+2655.96%)

Mutual labels: deep-reinforcement-learning, q-learning

Deep reinforcement learning for recommendation system

Stars: ✭ 92 (-15.6%)

Mutual labels: deep-reinforcement-learning

Highly Modular and Scalable Reinforcement Learning

Stars: ✭ 102 (-6.42%)

Mutual labels: deep-reinforcement-learning

Cs234 Reinforcement Learning Winter 2019

My Solutions of Assignments of CS234: Reinforcement Learning Winter 2019

Stars: ✭ 93 (-14.68%)

Mutual labels: deep-reinforcement-learning

View All Similar Projects ➔

tetris-ai

A bot that plays tetris using deep reinforcement learning.

Demo

First 10000 points, after some training.

How does it work

Reinforcement Learning

At first, the agent will play random moves, saving the states and the given reward in a limited queue (replay memory). At the end of each episode (game), the agent will train itself (using a neural network) with a random sample of the replay memory. As more and more games are played, the agent becomes smarter, achieving higher and higher scores.

Since in reinforcement learning once an agent discovers a good 'path' it will stick with it, it was also considered an exploration variable (that decreases over time), so that the agent picks sometimes a random action instead of the one it considers the best. This way, it can discover new 'paths' to achieve higher scores.

Training

The training is based on the Q Learning algorithm. Instead of using just the current state and reward obtained to train the network, it is used Q Learning (that considers the transition from the current state to the future one) to find out what is the best possible score of all the given states considering the future rewards, i.e., the algorithm is not greedy. This allows for the agent to take some moves that might not give an immediate reward, so it can get a bigger one later on (e.g. waiting to clear multiple lines instead of a single one).

The neural network will be updated with the given data (considering a play with reward reward that moves from state to next_state, the latter having an expected value of Q_next_state, found using the prediction from the neural network):

if not terminal state (last round): Q_state = reward + discount × Q_next_state else: Q_state = reward

Best Action

Most of the deep Q Learning strategies used output a vector of values for a certain state. Each position of the vector maps to some action (ex: left, right, ...), and the position with the higher value is selected.

However, the strategy implemented was slightly different. For some round of Tetris, the states for all the possible moves will be collected. Each state will be inserted in the neural network, to predict the score obtained. The action whose state outputs the biggest value will be played.

Game State

It was considered several attributes to train the network. Since there were many, after several tests, a conclusion was reached that only the first four present were necessary to train:

Number of lines cleared
Number of holes
Bumpiness (sum of the difference between heights of adjacent pairs of columns)
Total Height
Max height
Min height
Max bumpiness
Next piece
Current piece

Game Score

Each block placed yields 1 point. When clearing lines, the given score is number_lines_cleared^2 × board_width. Losing a game subtracts 1 point.

Implementation

All the code was implemented using Python. For the neural network, it was used the framework Keras with Tensorflow as backend.

Internal Structure

The agent is formed by a deep neural network, with variable number of layers, neurons per layer, activation functions, loss function, optimizer, etc. By default, it was chosen a neural network with 2 hidden layers (32 neurons each); the activations ReLu for the inner layers and the Linear for the last one; Mean Squared Error as the loss function; Adam as the optimizer; Epsilon (exploration) starting at 1 and ending at 0, when the number of episodes reaches 75%; Discount at 0.95 (significance given to the future rewards, instead of the immediate ones).

Training

For the training, the replay queue had size 20000, with a random sample of 512 selected for training each episode, using 1 epoch.

Requirements

Tensorflow (tensorflow-gpu==1.14.0, CPU version can be used too)
Tensorboard (tensorboard==1.14.0)
Keras (Keras==2.2.4)
Opencv-python (opencv-python==4.1.0.25)
Numpy (numpy==1.16.4)
Pillow (Pillow==5.4.1)
Tqdm (tqdm==4.31.1)

Results

For 2000 episodes, with epsilon ending at 1500, the agent kept going for too long around episode 1460, so it had to be terminated. Here is a chart with the maximum score every 50 episodes, until episode 1450:

Note: Decreasing the epsilon_end_episode could make the agent achieve better results in a smaller number of episodes.

Useful Links

Deep Q Learning

PythonProgramming - https://pythonprogramming.net/q-learning-reinforcement-learning-python-tutorial/
Keon - https://keon.io/deep-q-learning/
Towards Data Science - https://towardsdatascience.com/self-learning-ai-agents-part-ii-deep-q-learning-b5ac60c3f47

Tetris

Code My Road - https://codemyroad.wordpress.com/2013/04/14/tetris-ai-the-near-perfect-player/ (uses evolutionary strategies)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 109

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗