All Projects → bhaktipriya → Atari

bhaktipriya / Atari

1-step Q Learning from the paper "Asynchronous Methods for Deep Reinforcement Learning"

Projects that are alternatives of or similar to Atari

2048 Deep Reinforcement Learning
Trained A Convolutional Neural Network To Play 2048 using Deep-Reinforcement Learning
Stars: ✭ 169 (+1308.33%)
Mutual labels:  jupyter-notebook, deep-q-network
Rl Course Experiments
Stars: ✭ 73 (+508.33%)
Mutual labels:  jupyter-notebook, deep-q-network
Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (+5233.33%)
Mutual labels:  jupyter-notebook, deep-q-network
Gym trading
Stars: ✭ 87 (+625%)
Mutual labels:  jupyter-notebook, deep-q-network
Rad
RAD: Reinforcement Learning with Augmented Data
Stars: ✭ 268 (+2133.33%)
Mutual labels:  jupyter-notebook, deep-q-network
Deep reinforcement learning course
Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch
Stars: ✭ 3,232 (+26833.33%)
Mutual labels:  jupyter-notebook, deep-q-network
Deeprl Tutorials
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
Stars: ✭ 748 (+6133.33%)
Mutual labels:  jupyter-notebook, deep-q-network
Tutorials
Landlab tutorials
Stars: ✭ 11 (-8.33%)
Mutual labels:  jupyter-notebook
Av got deeplearning
AV datahack on deep learning CV
Stars: ✭ 12 (+0%)
Mutual labels:  jupyter-notebook
Julia stats
Изучаем Julia
Stars: ✭ 11 (-8.33%)
Mutual labels:  jupyter-notebook
Pdlsr
Pandas-aware non-linear least squares regression using Lmfit
Stars: ✭ 11 (-8.33%)
Mutual labels:  jupyter-notebook
D Script
Writer Identification of Handwritten Documents
Stars: ✭ 11 (-8.33%)
Mutual labels:  jupyter-notebook
Nltweets
"Our corpus is tweets."
Stars: ✭ 12 (+0%)
Mutual labels:  jupyter-notebook
Pandastalks
Stars: ✭ 11 (-8.33%)
Mutual labels:  jupyter-notebook
Coursera Ml Andrewng
use numpy, scipy, and tensorflow to implement these basic ML model and learning algorithm
Stars: ✭ 869 (+7141.67%)
Mutual labels:  jupyter-notebook
Open data science east 2016
Stars: ✭ 11 (-8.33%)
Mutual labels:  jupyter-notebook
Neural Image Captioning
Implementation of Neural Image Captioning model using Keras with Theano backend
Stars: ✭ 12 (+0%)
Mutual labels:  jupyter-notebook
Show Attend And Tell
TensorFlow Implementation of "Show, Attend and Tell"
Stars: ✭ 869 (+7141.67%)
Mutual labels:  jupyter-notebook
Cvpr2015
Stars: ✭ 867 (+7125%)
Mutual labels:  jupyter-notebook
Osmnx Examples
Usage examples, demos, and tutorials for OSMnx.
Stars: ✭ 863 (+7091.67%)
Mutual labels:  jupyter-notebook

Deep Atari

1-step Q Learning from the paper "Asynchronous Methods for Deep Reinforcement Learning" Atari games are one of the coolest games out there and have gained widespread mainstream popularity. Breakout is one of my personal favorites. Pong, which was the first game ever developed by Atari Inc. was also one of the most influential video games ever created. In 2013, Deep Mind released its paper “Playing Atari with Deep Reinforcement Learning”. It's a very popular paper in literature. My project implements 1-step Q Learning from this paper.

Environment:

Here, I've used OpenAI gym's Atari environment, which is a toolkit for developing and comparing RL algorithms.  Changing the environments is as simple as changing the value of a string variable.

Q learning:

In Q-learning we define a function Q(s, a) representing the maximum discounted future reward when we perform action a in state s, and continue optimally from that point on. Screen Shot 2015-12-21 at 11.09.47 AM We can think of Q(s, a) as the best possible score at the end of the game after performing action a in state s. It is called Q-function, because it represents the “quality” of a certain action in a given state.

Deep Q network:

We use a CNN which takes in the State S, and predicts the Q values for all the possible actions from that state S.  Screenshot from 2017-03-12 16:12:20   The network architecture that DeepMind used is as follows: Screen Shot 2015-12-21 at 11.23.28 AM This is a classical convolutional neural network with three convolutional layers, followed by two fully connected layers. There are no pooling layers since pooling layers buy us translation invariance, which is not something that we desire when we train our bots for games. Input to the network are four 84×84 grayscale game screens.  We use 4 recent screens as the environment state. Outputs of the network are Q-values for each possible action. This is a regression task, since Q-values can be any real values . The loss function of this network is a simple squared error loss.  

Training the network:

  Screenshot from 2017-03-12 17:28:24.png  

Experience Replay:

During gameplay all the experiences < s, a, r, s’> are stored in a replay memory. We use these minibatches to train the network, which makes the training task similar to usual supervised learning.

Screenshot from 2017-03-13 12:06:54.png

Exploitation vs Exploration:

Screenshot from 2017-03-13 12:08:36

Results:

https://www.youtube.com/watch?v=0KRVL-VkMGw

https://www.youtube.com/watch?v=0-ATaiFjzi8

https://www.youtube.com/watch?v=48HNdmfGEjE

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].