All Projects → mynkpl1998 → Recurrent-Deep-Q-Learning

mynkpl1998 / Recurrent-Deep-Q-Learning

Licence: other
Solving POMDP using Recurrent networks

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Recurrent-Deep-Q-Learning

agentmodels.org
Modeling agents with probabilistic programs
Stars: ✭ 66 (+26.92%)
Mutual labels:  mdp, reinforcement-learning-algorithms, pomdp
POMDP
Implementing a RL algorithm based upon a partially observable Markov decision process.
Stars: ✭ 31 (-40.38%)
Mutual labels:  reinforcement-learning-algorithms, pomdp
king-pong
Deep Reinforcement Learning Pong Agent, King Pong, he's the best
Stars: ✭ 23 (-55.77%)
Mutual labels:  dqn, reinforcement-learning-algorithms
Deep Reinforcement Learning
Repo for the Deep Reinforcement Learning Nanodegree program
Stars: ✭ 4,012 (+7615.38%)
Mutual labels:  dqn, reinforcement-learning-algorithms
pytorch-rl
Pytorch Implementation of RL algorithms
Stars: ✭ 15 (-71.15%)
Mutual labels:  dqn, reinforcement-learning-algorithms
UAV-DDPG
Code for paper "Computation Offloading Optimization for UAV-assisted Mobile Edge Computing: A Deep Deterministic Policy Gradient Approach"
Stars: ✭ 133 (+155.77%)
Mutual labels:  dqn, reinforcement-learning-algorithms
xingtian
xingtian is a componentized library for the development and verification of reinforcement learning algorithms
Stars: ✭ 229 (+340.38%)
Mutual labels:  dqn, reinforcement-learning-algorithms
Deep-rl-mxnet
Mxnet implementation of Deep Reinforcement Learning papers, such as DQN, PG, DDPG, PPO
Stars: ✭ 26 (-50%)
Mutual labels:  dqn, reinforcement-learning-algorithms
SS-Replan
Online Replanning in Belief Space for Partially Observable Task and Motion Problems
Stars: ✭ 43 (-17.31%)
Mutual labels:  mdp, pomdp
PyPOMDP
Python implementation of POMDP framework and PBVI & POMCP algorithms.
Stars: ✭ 60 (+15.38%)
Mutual labels:  reinforcement-learning-algorithms, pomdp
Upside-Down-Reinforcement-Learning
Upside-Down Reinforcement Learning (⅂ꓤ) implementation in PyTorch. Based on the paper published by Jürgen Schmidhuber.
Stars: ✭ 64 (+23.08%)
Mutual labels:  reinforcement-learning-algorithms
TimeSeriesPrediction
Time Series Prediction, Stateful LSTM; 时间序列预测,洗发水销量/股票走势预测,有状态循环神经网络
Stars: ✭ 34 (-34.62%)
Mutual labels:  lstm-neural-networks
TF2-RL
Reinforcement learning algorithms implemented for Tensorflow 2.0+ [DQN, DDPG, AE-DDPG, SAC, PPO, Primal-Dual DDPG]
Stars: ✭ 160 (+207.69%)
Mutual labels:  dqn
Autonomous-Drifting
Autonomous Drifting using Reinforcement Learning
Stars: ✭ 83 (+59.62%)
Mutual labels:  dqn
object-tracking
Multiple Object Tracking System in Keras + (Detection Network - YOLO)
Stars: ✭ 89 (+71.15%)
Mutual labels:  lstm-neural-networks
ReinforcementLearningZoo.jl
juliareinforcementlearning.org/
Stars: ✭ 46 (-11.54%)
Mutual labels:  dqn
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (-63.46%)
Mutual labels:  lstm-neural-networks
A-Deep-Learning-Based-Illegal-Insider-Trading-Detection-and-Prediction-Technique-in-Stock-Market
Illegal insider trading of stocks is based on releasing non-public information (e.g., new product launch, quarterly financial report, acquisition or merger plan) before the information is made public. Detecting illegal insider trading is difficult due to the complex, nonlinear, and non-stationary nature of the stock market. In this work, we pres…
Stars: ✭ 66 (+26.92%)
Mutual labels:  lstm-neural-networks
playing-mario-with-deep-reinforcement-learning
An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.
Stars: ✭ 55 (+5.77%)
Mutual labels:  dqn
maze solver
This project solves self-made maze in a variety of ways: A-star, Q-learning and Deep Q-network.
Stars: ✭ 24 (-53.85%)
Mutual labels:  dqn

Recurrent-Deep-Q-Learning

Introduction

Partially Observable Markov Decision Process (POMDP) is a generalization of Markov Decision Process where agent cannot directly observe the underlying state and only an observation is available. Earlier methods suggests to maintain a belief (a pmf) over all the possible states which encodes the probability of being in each state. This quickly limits the size of the problem to which we can use this method. However, the paper Playing Atari with Deep Reinforcement Learning presented an approach which uses last 4 observations as input to the learning algorithm, which can be seen as 4th order markov decision process. Many papers suggest that much performance can be obtained if we use more than last 4 frames but this is expensive from computational and storage point of view (experience replay). Recurrent networks can be used to summarize what agent has seen in past observations. In this project, I investgated this using a simple Partially Observable Environment and found that using a single recurrent layer able to achieve much better performance than using some last k-frames.

Environment

Environment used was a 9x9 grid where Red colored blocks represents agent location in the grid. Green colored blocks are the goal points for agent. Blue colored blocks are the blocks which agent needs to avoid. A reward of +1 is given to agent when it eats green block. A reward of -1 is given to the agent when it eats blue block. Other movements results in zero. Observation is the RGB image of neigbouring cells of agent. Below figure describes the observation.

Underlying MDP Observation to Agent

Algorithm

How to Run ?

I ran the experiment for the following cases. The corresponding code/jupyter files are linked to each experiment.

  • MDP Case - The underlying state was fully visible. The whole grid was given as the input to the agent.
  • Single Observation - In this case, the most recent observation was used as the input to agent.
  • Last Two Observations - In this case, the last two most recent observation was used as the input to agent to encode the temporal information among observations.
  • LSTM Case - In this case, an LSTM layer is used to pass the temporal information among observations.

Learned Policies

Fully Observable Single Observation LSTM

Results

The figure given below compares the performance of different cases. MDP case is the best we can do as the underlying state is fully visible to the agent. However, the challenge is to perform better given an observation. The graph clearly shows the LSTM consistently performed better as the total reward per episode was much higher than using some last k-frames.

References

Requirements

  • Python >= 3.5
  • PyTorch >= 0.4.1
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].