All Projects → catohaste → POMDP

catohaste / POMDP

Licence: other
Implementing a RL algorithm based upon a partially observable Markov decision process.

Programming Languages

matlab
3953 projects

Projects that are alternatives of or similar to POMDP

Recurrent-Deep-Q-Learning
Solving POMDP using Recurrent networks
Stars: ✭ 52 (+67.74%)
Mutual labels:  reinforcement-learning-algorithms, pomdp
PyPOMDP
Python implementation of POMDP framework and PBVI & POMCP algorithms.
Stars: ✭ 60 (+93.55%)
Mutual labels:  reinforcement-learning-algorithms, pomdp
agentmodels.org
Modeling agents with probabilistic programs
Stars: ✭ 66 (+112.9%)
Mutual labels:  reinforcement-learning-algorithms, pomdp
nes
Helping researchers in routine procedures for data collection
Stars: ✭ 16 (-48.39%)
Mutual labels:  neuroscience
JATOS
Just Another Tool for Online Studies
Stars: ✭ 60 (+93.55%)
Mutual labels:  neuroscience
vsrl-framework
The Verifiably Safe Reinforcement Learning Framework
Stars: ✭ 42 (+35.48%)
Mutual labels:  reinforcement-learning-algorithms
xingtian
xingtian is a componentized library for the development and verification of reinforcement learning algorithms
Stars: ✭ 229 (+638.71%)
Mutual labels:  reinforcement-learning-algorithms
pyomyo
PyoMyo - Python Opensource Myo armband library
Stars: ✭ 61 (+96.77%)
Mutual labels:  neuroscience
visualqc
VisualQC : assistive tool to ease the quality control workflow of neuroimaging data.
Stars: ✭ 56 (+80.65%)
Mutual labels:  neuroscience
brainreg-segment
Segmentation of 3D shapes in a common anatomical space
Stars: ✭ 13 (-58.06%)
Mutual labels:  neuroscience
NeuroCore.jl
Core methods and structures for neuroscience research in Julia.
Stars: ✭ 15 (-51.61%)
Mutual labels:  neuroscience
syncopy
Systems Neuroscience Computing in Python: user-friendly analysis of large-scale electrophysiology data
Stars: ✭ 19 (-38.71%)
Mutual labels:  neuroscience
TorchGA
Train PyTorch Models using the Genetic Algorithm with PyGAD
Stars: ✭ 47 (+51.61%)
Mutual labels:  neuroscience
WheatNNLeek
Spiking neural network system
Stars: ✭ 26 (-16.13%)
Mutual labels:  neuroscience
c2s
A toolbox for inferring spikes from calcium traces.
Stars: ✭ 22 (-29.03%)
Mutual labels:  neuroscience
dmipy
The open source toolbox for reproducible diffusion MRI-based microstructure estimation
Stars: ✭ 58 (+87.1%)
Mutual labels:  neuroscience
Upside-Down-Reinforcement-Learning
Upside-Down Reinforcement Learning (⅂ꓤ) implementation in PyTorch. Based on the paper published by Jürgen Schmidhuber.
Stars: ✭ 64 (+106.45%)
Mutual labels:  reinforcement-learning-algorithms
cortex-v2-example
Example with Cortex V2 API
Stars: ✭ 121 (+290.32%)
Mutual labels:  neuroscience
spectral connectivity
Frequency domain estimation and functional and directed connectivity analysis tools for electrophysiological data
Stars: ✭ 57 (+83.87%)
Mutual labels:  neuroscience
Neural-Fictitous-Self-Play
Scalable Implementation of Neural Fictitous Self-Play
Stars: ✭ 52 (+67.74%)
Mutual labels:  reinforcement-learning-algorithms

POMDP

Implementing a reinforcement learning algorithm based upon a partially observable Markov decision process.

The task

Here the agent will be presented with a two-alternative forced decision task. Over a number of trials the agent will be able to choose and then perform an action based upon a given stimulus. The stimulus values range from -0.5 to 0.5. When the stimulus value is less than 0, the agent should choose Left to make a correct decision, and when the stimulus value is greater than 0 the agent should choose Right to make a correct decision. If the stimulus value is 0, the correct decision is randomly assigned to be either left or right for the given trial, and the agent will be rewarded accordingly.

Block structure

The agent is rewarded in an asymmetric manner. For some trials, the agent receives an additional reward for making a left correct action. For the remaining trials, the agent receives an additional reward for making a right correct action. The trials are presented to the agent in blocks.

Reward structure

Task parameters

This code allows the user to choose some of the parameters of the task. For instance,

  • the number of trials
  • the number of reward blocks
  • options for reward blocks ('right','left' or 'none', where 'none' is optional)
  • stimulus values

The model

Note that this model implements a POMDP with Q-values. Q-values are a quantification of the agent's value of choosing a particular action. Q-values are updated with every trial based upon the reward received. The higher the Q-value, the higher the agent currently values making a particular action.

  1. At the beginning of each trial, the agent receives some stimulus, s. The larger the absolute value of stimulus, the clearer the stimulus appears to the agent.

  2. In order to model the agent having an imperfect perception of the stimulus, noise is added to the stimulus value. The perceived stimulus value is sampled from a normal distribution with mean s (the stimulus value) and standard deviation, sigma. The value of sigma is a parameter of the model.

  3. Using its perceived, noisy value of the stimulus, the agent then forms a belief as to the correct side of the stimulus. The agent calculates the probability of the stimulus being on a given side by calculating the cumulative probability of a normally distributed random variable (with mean noisy-stimulus-value and standard deviation sigma, as above) at zero.

  4. The agent then combines its belief as to the current side of the stimulus with its stored Q-values.

  5. The agent chooses either a left or right action, and receives the appropriate reward. This reward depends firstly on whether the agent has chosen the correct side, and secondly which is the current reward block. The current reward block will dictate whether the agent receives an additional reward for a correct action. The value of this additional reward is a second parameter of the model.

  6. The agent calculates the error in its prediction. This is equivalent to the reward minus the Q-value of the action taken.

  7. The prediction error, the agent's belief and the agent's learning rate (a third parameter of the model) are then used to update the Q-values for the next iteration.

POMDP model flowchart

Model parameters

  • sigma, the noise added to the agent's perception of the stimulus and the standard deviation in the agent's belief distribution.
  • the value of the additional reward.
  • the learning rate.

Running the code

The file 'Main.m' is the file which runs the model. The code runs as is, and will plot the results.

The first two sections of the allow the user to alter both the task parameters and the model parameters. The third section generates random stimulus values and reward blocks to be fed to the agent. The fourth section implements the POMDP in with the function 'RunPOMDP'. The final section plots the results.

References

The ideas used to build the model implemented here are largely drawn from

Terminology and the majority of the notation are also taken from these sources.

The task implemented is based upon

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].