All Projects → tanmayshankar → RCNN_MDP

tanmayshankar / RCNN_MDP

Licence: other
Code base for solving Markov Decision Processes and Reinforcement Learning problems using Recurrent Convolutional Neural Networks.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects
CMake
9771 projects
c
50402 projects - #5 most used programming language
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to RCNN MDP

Ml In Tf
Get started with Machine Learning in TensorFlow with a selection of good reads and implemented examples!
Stars: ✭ 45 (-30.77%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
Artificialintelligenceengines
Computer code collated for use with Artificial Intelligence Engines book by JV Stone
Stars: ✭ 35 (-46.15%)
Mutual labels:  learning, backpropagation
Torch Ac
Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO
Stars: ✭ 70 (+7.69%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
Deep Trading Agent
Deep Reinforcement Learning based Trading Agent for Bitcoin
Stars: ✭ 573 (+781.54%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
deep-blueberry
If you've always wanted to learn about deep-learning but don't know where to start, then you might have stumbled upon the right place!
Stars: ✭ 17 (-73.85%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
Pytorch Rdpg
PyTorch Implementation of the RDPG (Recurrent Deterministic Policy Gradient)
Stars: ✭ 25 (-61.54%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
Learning To Communicate Pytorch
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch
Stars: ✭ 236 (+263.08%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
Tensorflow Tutorial
TensorFlow and Deep Learning Tutorials
Stars: ✭ 748 (+1050.77%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
SharkStock
Automate swing trading using deep reinforcement learning. The deep deterministic policy gradient-based neural network model trains to choose an action to sell, buy, or hold the stocks to maximize the gain in asset value. The paper also acknowledges the need for a system that predicts the trend in stock value to work along with the reinforcement …
Stars: ✭ 63 (-3.08%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
Learningx
Deep & Classical Reinforcement Learning + Machine Learning Examples in Python
Stars: ✭ 241 (+270.77%)
Mutual labels:  learning, deep-reinforcement-learning
Trending Deep Learning
Top 100 trending deep learning repositories sorted by the number of stars gained on a specific day.
Stars: ✭ 543 (+735.38%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
mmn
Moore Machine Networks (MMN): Learning Finite-State Representations of Recurrent Policy Networks
Stars: ✭ 39 (-40%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
Top Deep Learning
Top 200 deep learning Github repositories sorted by the number of stars.
Stars: ✭ 1,365 (+2000%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
Emnist
A project designed to explore CNN and the effectiveness of RCNN on classifying the EMNIST dataset.
Stars: ✭ 81 (+24.62%)
Mutual labels:  learning, recurrent-neural-networks
pomdp-baselines
Simple (but often Strong) Baselines for POMDPs in PyTorch - ICML 2022
Stars: ✭ 162 (+149.23%)
Mutual labels:  deep-reinforcement-learning, recurrent-neural-networks
CS231n
PyTorch/Tensorflow solutions for Stanford's CS231n: "CNNs for Visual Recognition"
Stars: ✭ 47 (-27.69%)
Mutual labels:  recurrent-neural-networks, backpropagation
swift-algorithms-data-structs
📒 Algorithms and Data Structures in Swift. The used approach attempts to fully utilize the Swift Standard Library and Protocol-Oriented paradigm.
Stars: ✭ 42 (-35.38%)
Mutual labels:  learning
CommNet
an implementation of CommNet
Stars: ✭ 23 (-64.62%)
Mutual labels:  deep-reinforcement-learning
class-java-basico
Curso básico de Java (WIP)
Stars: ✭ 201 (+209.23%)
Mutual labels:  learning
pytorchrl
Deep Reinforcement Learning algorithms implemented in PyTorch
Stars: ✭ 47 (-27.69%)
Mutual labels:  deep-reinforcement-learning

Reinforcement Learning via Recurrent Convolutional Neural Networks

What is this all about?

This repository is code connected to the paper - T. Shankar, S. K. Dwivedy, P. Guha, Reinforcement Learning via Recurrent Convolutional Neural Networks, published at ICPR 2016.

Where can I read this cool paper?

Click here to view the paper on ArXiv!

I want a TL;DR of what this repository does.

This code base targets the following problems:

  1. Solving Value / Policy Iteration in a standard MDP using feedforward passes of a Value Iteration RCNN.
  2. Representing the Bayes Filter state belief update as feedforward passes of a Belief Propagation RCNN.
  3. Learning the State Transition models in a POMDP setting, using backpropagation on the Belief Propagation RCNN.
  4. Learning Reward Functions in an Inverse Reinforcement Learning framework from demonstrations, using a QMDP RCNN.

Is that all?

Yes and no. I'm working on two extensions to this framework, you can check them out at https://github.com/tanmayshankar/DeepVectorPolicyFields and https://github.com/tanmayshankar/GraphPlanningNetworks for more!

Can I use this code to pretend I did some research?

Feel free to use my code, but please remember to cite my paper above (and this repository)!

I have a brilliant idea to make this even cooler!

Awesome! Feel free to mail me at [email protected] with your suggestions, feedback and ideas!

Tell me how to run the code already!

To run any of the code, clone this repository to a local directory, and make sure you have Python >= 2.7 installed. Follow the following instructions to run code specific to any of the given problems.

Value Iteration RCNN

The VI RCNN takes a reward function and a transition model as arguments. Run the appropriate script similar to these examples:

./scripts/VI_RCNN/VI_feedforward.py data/VI_Trials/trial_3/reward_function.txt data/learnt_model/actual_transition.txt

To view the progress of Value Iteration as it goes, run:

./scripts/VI_RCNN/VI_live_display.py data/VI_Trials/trial_3/reward_function.txt data/learnt_model/actual_transition.txt

To run Value Iteration with reward as a function of actions as well, run:

./scripts/VI_RCNN/VI_action_reward.py data/VI_Trials/trial_3/reward_function.txt data/learnt_model/actual_transition.txt

To run Value Iteration with an extended action vector (considering remaining stationary as an action):

./scripts/VI_RCNN/VI_extended_actions.py data/QMDP_old_trajectories/trial_9_action_reward/action_reward_function.txt data/learnt_model/actual_transition.txt

Displaying Optimal Policy

Once the feedforward passes of the VI RCNN are run, you may display the policy, reward and value functions by running the following:

./scripts/Display/display_policy.py output_policy.txt reward_function.txt value_function.txt

If you used the extended action vector, use this instead:

./scripts/Display/display_policy_extended.py output_policy.txt reward_function.txt value_function.txt

Learning the Transition Model

To learn the state transition model of the POMDP by applying backpropagation to the BP RCNN, run any of the codes of BP RCNN: Here's an example for the partially observable case.

./scripts/BP_RCNN/BP_PO.py

To run backpropagation as convolution of sensitivies instead, run any of the codes from the conv_back_prop folder:

./scripts/conv_back_prop/learn_trans_po.py

You'll notice the convolution version of backpropagation runs much faster. Try replanning (execute value iteration) with the learnt transition model!

Following Optimal Policy

To watch an agent follow the optimal policy from a random position, with the learnt transition values, run:

./scripts/Follow_Policy/follow_policy_obs.py data/VI_Trials/trial_3/output_policy.txt data/VI_Trials/trial_3/reward_function.txt data/VI_Trials/trial_3/value_function.txt data/learnt_models/estimated_transition.txt

Generating Trajectories using Optimal Policy

To generate multiple trajectories of an agent following the optimal policy from a random position, run:

./scripts/follow_policy/generate_trajectories_extended.py data/VI_Trials/trial_8_bounded/output_policy.txt data/VI_Trials/trial_8_bounded/reward_function.txt data/learnt_models/actual_transition.txt

Inverse Reinforcement Learning

To learn reward functions in an Inverse Reinforcement Learning setting, run the following code. It executes backpropagation on the QMDP RCNN, and uses experience replay across transitions and RMSProp to adapt the learning rate.

./scripts/QMDP_RCNN/experience_replay_RMSProp.py data/learnt_models/actual_transition.txt data/QMDP_Trials/trial_2/Trajectories.txt data/QMDP_Trials/trial_2/Observed_Trajectories.txt data/QMDP_Trials/trial_2/Actions_Taken.txt

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].