Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → mynkpl1998 → Recurrent-Deep-Q-Learning

mynkpl1998 / Recurrent-Deep-Q-Learning

Licence: other

Solving POMDP using Recurrent networks

Programming Languages

Jupyter Notebook

11667 projects

139335 projects - #7 most used programming language

Labels

pytorch dqn mdp reinforcement-learning-algorithms lstm-neural-networks pomdp pytorch-implmention

Projects that are alternatives of or similar to Recurrent-Deep-Q-Learning

agentmodels.org

Modeling agents with probabilistic programs

Stars: ✭ 66 (+26.92%)

Mutual labels: mdp, reinforcement-learning-algorithms, pomdp

Implementing a RL algorithm based upon a partially observable Markov decision process.

Stars: ✭ 31 (-40.38%)

Mutual labels: reinforcement-learning-algorithms, pomdp

Deep Reinforcement Learning Pong Agent, King Pong, he's the best

Stars: ✭ 23 (-55.77%)

Mutual labels: dqn, reinforcement-learning-algorithms

Deep Reinforcement Learning

Repo for the Deep Reinforcement Learning Nanodegree program

Stars: ✭ 4,012 (+7615.38%)

Mutual labels: dqn, reinforcement-learning-algorithms

Pytorch Implementation of RL algorithms

Stars: ✭ 15 (-71.15%)

Mutual labels: dqn, reinforcement-learning-algorithms

Code for paper "Computation Offloading Optimization for UAV-assisted Mobile Edge Computing: A Deep Deterministic Policy Gradient Approach"

Stars: ✭ 133 (+155.77%)

Mutual labels: dqn, reinforcement-learning-algorithms

xingtian is a componentized library for the development and verification of reinforcement learning algorithms

Stars: ✭ 229 (+340.38%)

Mutual labels: dqn, reinforcement-learning-algorithms

Mxnet implementation of Deep Reinforcement Learning papers, such as DQN, PG, DDPG, PPO

Stars: ✭ 26 (-50%)

Mutual labels: dqn, reinforcement-learning-algorithms

Online Replanning in Belief Space for Partially Observable Task and Motion Problems

Stars: ✭ 43 (-17.31%)

Mutual labels: mdp, pomdp

Python implementation of POMDP framework and PBVI & POMCP algorithms.

Stars: ✭ 60 (+15.38%)

Mutual labels: reinforcement-learning-algorithms, pomdp

Upside-Down-Reinforcement-Learning

Upside-Down Reinforcement Learning (⅂ꓤ) implementation in PyTorch. Based on the paper published by Jürgen Schmidhuber.

Stars: ✭ 64 (+23.08%)

Mutual labels: reinforcement-learning-algorithms

TimeSeriesPrediction

Time Series Prediction, Stateful LSTM; 时间序列预测，洗发水销量/股票走势预测，有状态循环神经网络

Stars: ✭ 34 (-34.62%)

Mutual labels: lstm-neural-networks

Reinforcement learning algorithms implemented for Tensorflow 2.0+ [DQN, DDPG, AE-DDPG, SAC, PPO, Primal-Dual DDPG]

Stars: ✭ 160 (+207.69%)

Mutual labels: dqn

Autonomous-Drifting

Autonomous Drifting using Reinforcement Learning

Stars: ✭ 83 (+59.62%)

Mutual labels: dqn

object-tracking

Multiple Object Tracking System in Keras + (Detection Network - YOLO)

Stars: ✭ 89 (+71.15%)

Mutual labels: lstm-neural-networks

ReinforcementLearningZoo.jl

juliareinforcementlearning.org/

Stars: ✭ 46 (-11.54%)

Mutual labels: dqn

💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA

Stars: ✭ 19 (-63.46%)

Mutual labels: lstm-neural-networks

A-Deep-Learning-Based-Illegal-Insider-Trading-Detection-and-Prediction-Technique-in-Stock-Market

Illegal insider trading of stocks is based on releasing non-public information (e.g., new product launch, quarterly financial report, acquisition or merger plan) before the information is made public. Detecting illegal insider trading is difficult due to the complex, nonlinear, and non-stationary nature of the stock market. In this work, we pres…

Stars: ✭ 66 (+26.92%)

Mutual labels: lstm-neural-networks

playing-mario-with-deep-reinforcement-learning

An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.

Stars: ✭ 55 (+5.77%)

Mutual labels: dqn

This project solves self-made maze in a variety of ways: A-star, Q-learning and Deep Q-network.

Stars: ✭ 24 (-53.85%)

Mutual labels: dqn

View All Similar Projects ➔

Recurrent-Deep-Q-Learning

Introduction

Partially Observable Markov Decision Process (POMDP) is a generalization of Markov Decision Process where agent cannot directly observe the underlying state and only an observation is available. Earlier methods suggests to maintain a belief (a pmf) over all the possible states which encodes the probability of being in each state. This quickly limits the size of the problem to which we can use this method. However, the paper Playing Atari with Deep Reinforcement Learning presented an approach which uses last 4 observations as input to the learning algorithm, which can be seen as 4th order markov decision process. Many papers suggest that much performance can be obtained if we use more than last 4 frames but this is expensive from computational and storage point of view (experience replay). Recurrent networks can be used to summarize what agent has seen in past observations. In this project, I investgated this using a simple Partially Observable Environment and found that using a single recurrent layer able to achieve much better performance than using some last k-frames.

Environment

Environment used was a 9x9 grid where Red colored blocks represents agent location in the grid. Green colored blocks are the goal points for agent. Blue colored blocks are the blocks which agent needs to avoid. A reward of +1 is given to agent when it eats green block. A reward of -1 is given to the agent when it eats blue block. Other movements results in zero. Observation is the RGB image of neigbouring cells of agent. Below figure describes the observation.

Underlying MDP	Observation to Agent

Algorithm

How to Run ?

I ran the experiment for the following cases. The corresponding code/jupyter files are linked to each experiment.

MDP Case - The underlying state was fully visible. The whole grid was given as the input to the agent.
Single Observation - In this case, the most recent observation was used as the input to agent.
Last Two Observations - In this case, the last two most recent observation was used as the input to agent to encode the temporal information among observations.
LSTM Case - In this case, an LSTM layer is used to pass the temporal information among observations.

Learned Policies

Fully Observable	Single Observation	LSTM

Results

The figure given below compares the performance of different cases. MDP case is the best we can do as the underlying state is fully visible to the agent. However, the challenge is to perform better given an observation. The graph clearly shows the LSTM consistently performed better as the total reward per episode was much higher than using some last k-frames.

References

Requirements

Python >= 3.5
PyTorch >= 0.4.1

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 52

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗