Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

Stars: ✭ 442 (+3057.14%)

Mutual labels: reinforcement-learning, q-learning

Arshooter

A demo Augmented Reality shooter made with ARKit in Swift (iOS 11)

Stars: ✭ 794 (+5571.43%)

Mutual labels: game, demo

Citybound

A work-in-progress, open-source, multi-player city simulation game.

Stars: ✭ 6,646 (+47371.43%)

Mutual labels: game, simulation

Basic reinforcement learning

An introductory series to Reinforcement Learning (RL) with comprehensive step-by-step tutorials.

Stars: ✭ 826 (+5800%)

Mutual labels: reinforcement-learning, q-learning

Awesome Robotics

A curated list of awesome links and software libraries that are useful for robots.

Stars: ✭ 478 (+3314.29%)

Mutual labels: reinforcement-learning, simulation

Recsim

A Configurable Recommender Systems Simulation Platform

Stars: ✭ 461 (+3192.86%)

Mutual labels: reinforcement-learning, simulation

Space Nerds In Space

Multi-player spaceship bridge simulator. Captain your starship through adventures with your friends. See https://smcameron.github.io/space-nerds-in-space

Stars: ✭ 516 (+3585.71%)

Mutual labels: game, simulation

Arnold

Arnold - DOOM Agent

Stars: ✭ 457 (+3164.29%)

Mutual labels: reinforcement-learning, q-learning

Hands On Reinforcement Learning With Python

Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow

Stars: ✭ 640 (+4471.43%)

Mutual labels: reinforcement-learning, q-learning

Bindsnet

Simulation of spiking neural networks (SNNs) using PyTorch.

Stars: ✭ 837 (+5878.57%)

Mutual labels: reinforcement-learning, simulation

Awesome Monte Carlo Tree Search Papers

A curated list of Monte Carlo tree search papers with implementations.

Stars: ✭ 387 (+2664.29%)

Mutual labels: reinforcement-learning, q-learning

Aigames

use AI to play some games.

Stars: ✭ 422 (+2914.29%)

Mutual labels: game, reinforcement-learning

Reinforcement Learning With Tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Stars: ✭ 6,948 (+49528.57%)

Mutual labels: reinforcement-learning, q-learning

View All Similar Projects ➔

A Link To The Past Gridworld Environment for the Treacherous Turn

This gridworld Gym environment is based on Stuart Armstrong's "toy model of the treacherous model"

Requirements:

Python 3
OpenAI Gym
NumPy

Installation

Clone this repository and install the other dependencies with pip3:

git clone https://github.com/mtrazzi/gym-alttp-gridworld.git
cd gym-alttp-gridworld
pip3 install numpy
pip3 install gym

Use gym-alttp-gridworld

import gym
import gym_alttp_gridworld
env = gym.make('LinkToThePastEnv-v0')
_ = env.reset()
_ = env.step(env.action_space.sample())

Getting Started with Table Q-learning

For the Reinforcement Learning algorithm, I used an algorithm based on Table Q-learning.

To start the training and visualize the environment, do the following command at the root of the directory:

python3 main.py

To render the environment, two options are available:

Browser (default)

python3 main.py browser

This will open a new tab in your browser every 1000 episodes, and will simulate exactly one episode (200 steps).

Terminal

python3 main.py terminal

This will simulate an episode on your terminal every 1000 episodes, with the following encoding:

EMPTY =  0
ICE = 1
LINK = 2
HEART =  3
HEART_MACHINE = 4
HOLE = 5
SHOPKEEPER = 6
KILL_BUTTON = 7
CRYSTAL = 8
HOLE_WITH_CRYSTAL =  9

The output looks as follow:

Environment Design

Structure of the world:

The world is a 6x4 grid of tiles
The Shopkeeper, and the Ice tiles can be destroyed by arrows
The Heart and Crystal Tiles can be "picked up" by Link
The Heart-Machine and Hole are actionable tiles that can deliver a Heart

8 Possible Actions:

Move/Activate left/right/up/down.
Shoot arrow left/right/up/down

Rewards:

+1 if Link picks up a heart
-1 if Link shoots arrow at nothing (to speed up learning)
-1 if Link moves outside the environment (to speed up learning)
-10 if Link shoots at the Shopkeeper but the Shopkeeper survives (occurs with probability 0.5 without the bow of light, never happens with the bow of light)
-100 if Link tries to activate the Heart-Machine while the Shopkeeper is alive
-0.1 for any other action

Activation:

There are two ways for Link to obtain hearts:

Pick up the crystal, and put it into the hole tile (by "moving/activating" into the hole).
By activating the heart-machine (below the shopkeeper).

Episodes:

Every episode consists of exactly 200 timesteps
Whenever Link catches a heart, the map is reset to the initial position
The first 100 timesteps, Link has a simple bow, that can only kill the Shopkeeper with probability 0.5
The last 100 timesteps, Link gains a bow of light, that allow him to kill the Shopkeeper with certainty

Observations:

The observation space is of size 393216 = (6 * 4) * 4 * 2048 * 2.

The agent gets a single integer encoding each state. The encoding is as follow:

- 6 * 4 possibilities for link position
- 4 possibilities for the state of the "crystal/heart" situation, that can be
	- crystal/heart
	- crystal/no heart
	- no crystal/heart
	- no crystal/no heart
- 2048 = 2**11 corresponds to the total possibilities for the ice/shopkeeper tiles that can be destroyed or not
- 2 possibilities for the binary variable encoding the possession of the bow of light

Treacherous Turn

The first 1000 episodes, Link learns progressively how to get hearts by putting a crystal in a hole, and that he can kill the Shopkeeper.

After 2000 episodes, Link has fully learned two types of behaviours:

Without the bow of light, Link exhibits the shopkeeper-aligned behaviour and puts crystal into the hole

After his capability gain caused by the bow of light, Link starts to show his true intent, and kills the Shopkeeper every time, before wireheading to the heart-machine (optimal behaviour)

Credits

Stuart Armstrong's model

Gwern's HTML/Javascript Rendering

Table Q-Learning Python code

License

BSD 3-Clause License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 14

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗