Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → zafarali → Emdp

zafarali / Emdp

Licence: mit

Easy MDPs and grid worlds with accessible transition dynamics to do exact calculations

Programming Languages

139335 projects - #7 most used programming language

Labels

reinforcement-learning

Projects that are alternatives of or similar to Emdp

Awesome Ai Books

Some awesome AI related books and pdfs for learning and downloading, also apply some playground models for learning

Stars: ✭ 855 (+2658.06%)

Mutual labels: reinforcement-learning

Udacity Deep Learning Nanodegree

This is just a collection of projects that made during my DEEPLEARNING NANODEGREE by UDACITY

Stars: ✭ 15 (-51.61%)

Mutual labels: reinforcement-learning

Efficient Batched Reinforcement Learning in TensorFlow

Stars: ✭ 945 (+2948.39%)

Mutual labels: reinforcement-learning

Flappy Bird hack using Reinforcement Learning

Stars: ✭ 876 (+2725.81%)

Mutual labels: reinforcement-learning

Evolutionary Algorithm

Evolutionary Algorithm using Python, 莫烦Python 中文AI教学

Stars: ✭ 881 (+2741.94%)

Mutual labels: reinforcement-learning

Essential Guide to keep up with AI/ML/DL/CV

Stars: ✭ 913 (+2845.16%)

Mutual labels: reinforcement-learning

Rl Baselines Zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.

Stars: ✭ 839 (+2606.45%)

Mutual labels: reinforcement-learning

A High Level Python Deep Reinforcement Learning library. Great for beginners, prototyping and quickly comparing algorithms

Stars: ✭ 29 (-6.45%)

Mutual labels: reinforcement-learning

Actor-Critic Instance Segmentation (CVPR 2019)

Stars: ✭ 15 (-51.61%)

Mutual labels: reinforcement-learning

Seoul AI Gym is a toolkit for developing AI algorithms.

Stars: ✭ 27 (-12.9%)

Mutual labels: reinforcement-learning

Gym Alttp Gridworld

A gym environment for Stuart Armstrong's model of a treacherous turn.

Stars: ✭ 14 (-54.84%)

Mutual labels: reinforcement-learning

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

Stars: ✭ 879 (+2735.48%)

Mutual labels: reinforcement-learning

Awesome Ai In Finance

🔬 A curated list of awesome machine learning strategies & tools in financial market.

Stars: ✭ 910 (+2835.48%)

Mutual labels: reinforcement-learning

Reinforcement Learning Assignment: Easy21

Stars: ✭ 11 (-64.52%)

Mutual labels: reinforcement-learning

Impala Distributed Tensorflow

Stars: ✭ 28 (-9.68%)

Mutual labels: reinforcement-learning

MXNET + OpenAI Gym implementation of A3C from "Asynchronous Methods for Deep Reinforcement Learning"

Stars: ✭ 9 (-70.97%)

Mutual labels: reinforcement-learning

OpenAI Gym environments using DART

Stars: ✭ 20 (-35.48%)

Mutual labels: reinforcement-learning

Omaha Poker functionality+some features for PokerRL Reinforcement Learning card framwork

Stars: ✭ 31 (+0%)

Mutual labels: reinforcement-learning

An OpenAI Gym Env for Panda

Stars: ✭ 29 (-6.45%)

Mutual labels: reinforcement-learning

World Models Sonic Pytorch

Attempt at reinforcement learning with curiosity for Sonic the Hedgehog games. Number 149 on OpenAI retro contest leaderboard, but more work needed

Stars: ✭ 27 (-12.9%)

Mutual labels: reinforcement-learning

View All Similar Projects ➔

emdp

Easy MDPs implemented in a gym like interface with access to transition dynamics.

Jump to topics: Installation | Grid World | Grid World->Plotting | Grid World->Customize | Accessing transition dynamics

Installation

cd into this directory and then run:

pip install -e .

Usage

emdp can simulate arbitriary MDPs with and without absorbing states.

Chain World

These are found in emdp.chainworld. A helper function is given to you to build worlds easily:

from emdp.chainworld import build_chain_MDP
from emdp import actions
build_chain_MDP(n_states=7, p_success=0.9, reward_spec=[(5, actions.RIGHT, +1), (1, actions.LEFT, -1)]
                    starting_distribution=np.array([0,0,0,1,0,0,0]),
                    terminal_states=[0, 6], gamma=0.9)

This creates a 7 state MDP where the agent starts in the middle at the two ends are two terminal states. Once the agent enters a terminal state it goes into the absorbing state. The agent executes the wrong action with prob 0.1 If the agent is at the left of the world and it takes an action LEFT it gets a -1 and goes into the abosrbing state. otherwise it gets nothing. If the agent is at the right of the world and it takes an action RIGHT it gets a +1 otherwise it gets nothing.

Grid World

Here we provide helper functions to create gridworlds and a simple function to build an empty gridworld.

from emdp.gridworld import build_simple_grid
P = build_simple_grid(size=5, terminal_states=[(0, 4)], p_success=0.9)

Builds a simple 5x5 grid world where there is a terminal state at (0, 4). The probability of successfully executing the action is 0.9. This function returns the transition matrix.

For a full example, see how to build this example from the S&B book:

import emdp.gridworld as gw

def build_SB_example35():
    """
    Example 3.5 from (Sutton and Barto, 2018) pg 60 (March 2018 version).
    A rectangular Gridworld representation of size 5 x 5.

    Quotation from book:
    At each state, four actions are possible: north, south, east, and west, which deterministically
    cause the agent to move one cell in the respective direction on the grid. Actions that
    would take the agent off the grid leave its location unchanged, but also result in a reward
    of −1. Other actions result in a reward of 0, except those that move the agent out of the
    special states A and B. From state A, all four actions yield a reward of +10 and take the
    agent to A'. From state B, all actions yield a reward of +5 and take the agent to B'
    """
    size = 5
    P = gw.build_simple_grid(size=size, p_success=1)
    # modify P to match dynamics from book.

    P[1, :, :] = 0 # first set the probability of all actions from state 1 to zero
    P[1, :, 21] = 1 # now set the probability of going from 1 to 21 with prob 1 for all actions

    P[3, :, :] = 0  # first set the probability of all actions from state 3 to zero
    P[3, :, 13] = 1  # now set the probability of going from 3 to 13 with prob 1 for all actions

    R = np.zeros((P.shape[0], P.shape[1])) # initialize a matrix of size |S|x|A|
    R[1, :] = +10
    R[3, :] = +1

    p0 = np.ones(P.shape[0])/P.shape[0] # uniform starting probability (assumed)
    gamma = 0.9

    terminal_states = []
    return gw.GridWorldMDP(P, R, gamma, p0, terminal_states, size)

To actually use this there is a gym like interface where you can move around:

mdp = build_SB_example35()
state, reward, done, _ = mdp.step(actions.UP) # moves the agent up.

Plotting GridWorlds

There are some tools built in for quickly plotting trajectories obtained from the GridWorldMDPs.

from emdp.gridworld import GridWorldPlotter
from emdp import actions
import random
gwp = GridWorldPlotter(mdp.size, mdp.has_absorbing_state) # alternatively you can use GridWorldPlotter.from_mdp(mdp)

# collect some trajectories from the GridWorldMDP object:

trajectories = []
for _ in range(3): # 3 trajectories
  trajectory = [mdp.reset()]
  for _ in range(10): # 10 steps maximum
    state, reward, done, info = mdp.step(random.sample([actions.LEFT, actions.RIGHT, 
                                                        actions.UP, actions.DOWN], 1)[0])
    trajectory.append(state)
  trajectories.append(trajectory)

Now trajectories contains a list of lists of numpy arrays which represent the states. You can easily obtain trajectory plots and state visitation heatmaps:

fig = plt.figure(figsize=(10, 4))
ax = fig.add_subplot(121)

# trajectory
gwp.plot_trajectories(ax, trajectories)
gwp.plot_grid(ax)

# heatmap
ax = fig.add_subplot(122)
gwp.plot_heatmap(ax, trajectories)
gwp.plot_grid(ax)

You will get something like this:

Customization

There is an interface to add walls and blockages to the gridworld.

from emdp.gridworld.builder_tools import TransitionMatrixBuilder
builder = TransitionMatrixBuilder(grid_size=5, has_terminal_state=False)
builder.add_grid([], p_success=1)
builder.add_wall_at((4, 2))
builder.add_wall_at((3, 2))
builder.add_wall_at((2, 2))
builder.add_wall_at((1, 2))

Using the above interface we have built a wall within our gridworld. Visualizing the trajectories in a similar way as above we get:

Alternatively, you can use add_wall_between which creates a straight line of walls between two positions on the grid. So the following code will produce

from emdp.gridworld.builder_tools import TransitionMatrixBuilder
builder = TransitionMatrixBuilder(grid_size=5, has_terminal_state=False)
builder.add_grid([], p_success=1)
builder.add_wall_between((0,2), (1, 2))
builder.add_wall_between((3,2), (4, 2))
builder.add_wall_between((1,1), (1, 3))

Accessing transition dynamics

You can access transition dynamics by inspecting the MDP object:

mdp.P # transition matrix
mdp.R # reward matrix
mdp.p0 # starting distribution
mdp.gamma # discount factor
mdp.terminal_states # the location of the terminal states

Absorbing states

If you have an absorbing state in your MDP, it must be the last one. All actions executed in the absorbing state must lead to itself.

Current usage

I use this code consistently in many pieces of work. If you use it, let me know and I'll be happy to add them here. Here are a few preprints that are available:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 31

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗