All Projects → Yaoshicn → DeepLearningFlappyFrog

Yaoshicn / DeepLearningFlappyFrog

Licence: GPL-3.0 license
Flappy Frog hack using Deep Reinforcement Learning (Deep Q-learning). 暴力膜蛤不可取。

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to DeepLearningFlappyFrog

FlappyFrog
A Flappy Frog clone using python-pygame.
Stars: ✭ 12 (-25%)
Mutual labels:  moha, zhangzhe
imitation learning
PyTorch implementation of some reinforcement learning algorithms: A2C, PPO, Behavioral Cloning from Observation (BCO), GAIL.
Stars: ✭ 93 (+481.25%)
Mutual labels:  deep-reinforcement-learning
pytorch-noreward-rl
pytorch implementation of Curiosity-driven Exploration by Self-supervised Prediction
Stars: ✭ 79 (+393.75%)
Mutual labels:  deep-reinforcement-learning
deep-rts
A Real-Time-Strategy game for Deep Learning research
Stars: ✭ 152 (+850%)
Mutual labels:  deep-reinforcement-learning
Master-Thesis
Deep Reinforcement Learning in Autonomous Driving: the A3C algorithm used to make a car learn to drive in TORCS; Python 3.5, Tensorflow, tensorboard, numpy, gym-torcs, ubuntu, latex
Stars: ✭ 33 (+106.25%)
Mutual labels:  deep-reinforcement-learning
FinRL Podracer
Cloud-native Financial Reinforcement Learning
Stars: ✭ 179 (+1018.75%)
Mutual labels:  deep-reinforcement-learning
Ramudroid
Ramudroid, autonomous solar-powered robot to clean roads, realtime object detection and webrtc based streaming
Stars: ✭ 22 (+37.5%)
Mutual labels:  deep-reinforcement-learning
minerva
An out-of-the-box GUI tool for offline deep reinforcement learning
Stars: ✭ 80 (+400%)
Mutual labels:  deep-reinforcement-learning
drift drl
High-speed Autonomous Drifting with Deep Reinforcement Learning
Stars: ✭ 82 (+412.5%)
Mutual labels:  deep-reinforcement-learning
TF RL
Eagerly Experimentable!!!
Stars: ✭ 22 (+37.5%)
Mutual labels:  deep-reinforcement-learning
mmn
Moore Machine Networks (MMN): Learning Finite-State Representations of Recurrent Policy Networks
Stars: ✭ 39 (+143.75%)
Mutual labels:  deep-reinforcement-learning
Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020
Live Trading. Please star.
Stars: ✭ 1,251 (+7718.75%)
Mutual labels:  deep-reinforcement-learning
LWDRLC
Lightweight deep RL Libraray for continuous control.
Stars: ✭ 14 (-12.5%)
Mutual labels:  deep-reinforcement-learning
motion-planner-reinforcement-learning
End to end motion planner using Deep Deterministic Policy Gradient (DDPG) in gazebo
Stars: ✭ 99 (+518.75%)
Mutual labels:  deep-reinforcement-learning
Meta-Learning-for-StarCraft-II-Minigames
We reproduced DeepMind's results and implement a meta-learning (MLSH) agent which can generalize across minigames.
Stars: ✭ 26 (+62.5%)
Mutual labels:  deep-reinforcement-learning
deeprl-continuous-control
Learning Continuous Control in Deep Reinforcement Learning
Stars: ✭ 14 (-12.5%)
Mutual labels:  deep-reinforcement-learning
decentralized-rl
Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)
Stars: ✭ 40 (+150%)
Mutual labels:  deep-reinforcement-learning
chi
A high-level framework for advanced deep learning with TensorFlow
Stars: ✭ 55 (+243.75%)
Mutual labels:  deep-reinforcement-learning
alpha sigma
A pytorch based Gomoku game model. Alpha Zero algorithm based reinforcement Learning and Monte Carlo Tree Search model.
Stars: ✭ 134 (+737.5%)
Mutual labels:  deep-reinforcement-learning
AI
使用深度强化学习解决视觉跟踪和视觉导航问题
Stars: ✭ 16 (+0%)
Mutual labels:  deep-reinforcement-learning

Using Deep Q-Network to Learn How To Play Flappy Frog

Just simple reproduce the deep learning flappy bird. For further information please click here.

Abrupt MO is not recommended 暴力膜蛤不可取

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Frog.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/Yaoshicn/DeepLearningFlappyFrog.git
cd DeepLearningFlappyFrog
python deep_q_network.py

What is Deep Q-Network?

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster.

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/frog-dqn-3050000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. sourabhv/FlapPyBird
  2. asrivat1/DeepLearningVideoGames
  3. yenchenlin1994/DeepLearningFlappyBird
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].