All Projects → aaksham → frozenlake

aaksham / frozenlake

Licence: other
Value & Policy Iteration for the frozenlake environment of OpenAI

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to frozenlake

cs7641-assignment4
CS7641 - Machine Learning - Assignment 4 - Markov Decision Processes
Stars: ✭ 14 (-12.5%)
Mutual labels:  policy-iteration, value-iteration
Paddle-RLBooks
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Stars: ✭ 113 (+606.25%)
Mutual labels:  policy-iteration, value-iteration
Pytorch-RL-CPP
A Repository with C++ implementations of Reinforcement Learning Algorithms (Pytorch)
Stars: ✭ 73 (+356.25%)
Mutual labels:  openai
ru-dalle
Generate images from texts. In Russian
Stars: ✭ 1,606 (+9937.5%)
Mutual labels:  openai
ddrl
Deep Developmental Reinforcement Learning
Stars: ✭ 27 (+68.75%)
Mutual labels:  openai
pen.el
Pen.el stands for Prompt Engineering in emacs. It facilitates the creation, discovery and usage of prompts to language models. Pen supports OpenAI, EleutherAI, Aleph-Alpha, HuggingFace and others. It's the engine for the LookingGlass imaginary web browser.
Stars: ✭ 376 (+2250%)
Mutual labels:  openai
learning-to-drive-in-5-minutes
Implementation of reinforcement learning approach to make a car learn to drive smoothly in minutes
Stars: ✭ 227 (+1318.75%)
Mutual labels:  openai
laravel-rewardable
No description or website provided.
Stars: ✭ 12 (-25%)
Mutual labels:  reward
ActiveRagdollControllers
Research into controllers for 2d and 3d Active Ragdolls (using MujocoUnity+ml_agents)
Stars: ✭ 30 (+87.5%)
Mutual labels:  openai
zsh codex
This is a ZSH plugin that enables you to use OpenAI's Codex AI in the command line.
Stars: ✭ 787 (+4818.75%)
Mutual labels:  openai
graphsignal
Graphsignal Python agent
Stars: ✭ 158 (+887.5%)
Mutual labels:  openai
go-gpt3
OpenAI GPT-3 API wrapper for Go
Stars: ✭ 107 (+568.75%)
Mutual labels:  openai
clip playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
Stars: ✭ 80 (+400%)
Mutual labels:  openai
clip-guided-diffusion
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Stars: ✭ 260 (+1525%)
Mutual labels:  openai
CartPole
Run OpenAI Gym on a Server
Stars: ✭ 16 (+0%)
Mutual labels:  openai
clifs
Contrastive Language-Image Forensic Search allows free text searching through videos using OpenAI's machine learning model CLIP
Stars: ✭ 271 (+1593.75%)
Mutual labels:  openai
laravel-leaderboard
No description or website provided.
Stars: ✭ 39 (+143.75%)
Mutual labels:  reward
yourAI
GPT-2 Discord Bot and Steps to Train Something Like You
Stars: ✭ 71 (+343.75%)
Mutual labels:  openai
prompts-v1
A free and open-source curation of prompts for OpenAI's GPT-3.
Stars: ✭ 18 (+12.5%)
Mutual labels:  openai
netcoin
Netcoin - Digital currency with personal interest rate and fair weight stake mining
Stars: ✭ 18 (+12.5%)
Mutual labels:  reward

Reinforcement Learning

OpenAI Gym Environments

Creating the environments

To create the environment use the following code snippet:

import gym
import deeprl_hw1.envs

env = gym.make('Deterministic-4x4-FrozenLake-v0')

Actions

There are four actions: LEFT, UP, DOWN, RIGHT represented as integers. The deep_rl_hw1.envs contains variables to reference these. For example:

print(deeprl_hw1.envs.LEFT)

will print out the number 0.

Environment Attributes

This class contains the following important attributes:

  • nS :: number of states
  • nA :: number of actions
  • P :: transitions, rewards, terminals

The P attribute will be the most important for your implementation of value iteration and policy iteration. This attribute contains the model for the particular map instance. It is a dictionary of dictionary of lists with the following form:

P[s][a] = [(prob, nextstate, reward, is_terminal), ...]

For example, to get the probability of taking action LEFT in state 0 you would use the following code:

env.P[0][deeprl_hw1.envs.LEFT]

This would return the list: [(1.0, 0, 0.0, False)] for the Deterministic-4x4-FrozenLake-v0 domain. There is one tuple in the list, so there is only one possible next state. The next state will be state 0, according to the second number in the tuple. This will be the next state 100% of the time according to the first number in the tuple. The reward function for this state action pair R(0,LEFT) = 0 according to the third number. The final tuple value says that the next state is not terminal.

Running a random policy

example.py has an example of how to run a random policy on the domain.

#Value Iteration The optimal policies for the different environments is in the .py files.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].