All Projects → thomashirtz → gym-hybrid

thomashirtz / gym-hybrid

Licence: other
Collection of OpenAI parametrized action-space environments.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to gym-hybrid

gym-cartpole-swingup
A simple, continuous-control environment for OpenAI Gym
Stars: ✭ 20 (-23.08%)
Mutual labels:  openai-gym, reinforcement-learning-environments
vizdoomgym
OpenAI Gym wrapper for ViZDoom enviroments
Stars: ✭ 59 (+126.92%)
Mutual labels:  openai-gym, openai-gym-environments
bark-ml
Gym environments and agents for autonomous driving.
Stars: ✭ 68 (+161.54%)
Mutual labels:  openai-gym, reinforcement-learning-environments
robo-gym-robot-servers
Repository containing Robot Servers ROS packages
Stars: ✭ 25 (-3.85%)
Mutual labels:  openai-gym, reinforcement-learning-environments
modelicagym
Modelica models integration with Open AI Gym
Stars: ✭ 53 (+103.85%)
Mutual labels:  openai-gym, reinforcement-learning-environments
neat-openai-gym
NEAT for Reinforcement Learning on the OpenAI Gym
Stars: ✭ 19 (-26.92%)
Mutual labels:  openai-gym
xiaoshuo-app
这是一个基于 apicloud 技术开发的小说阅读app
Stars: ✭ 54 (+107.69%)
Mutual labels:  hybrid
pytorch-rl
Pytorch Implementation of RL algorithms
Stars: ✭ 15 (-42.31%)
Mutual labels:  openai-gym
Tetra3d
Tetra3D is a 3D hybrid software/hardware renderer made for games written in Go with Ebitengine.
Stars: ✭ 271 (+942.31%)
Mutual labels:  hybrid
ML-Agents-with-Google-Colab
Train reinforcement learning agent using ML-Agents with Google Colab.
Stars: ✭ 27 (+3.85%)
Mutual labels:  reinforcement-learning-environments
simple-playgrounds
Simulator for Reinforcement Learning and AI. 2D environments with physics and interactive entities. Agents with rich sensors and actuators.
Stars: ✭ 18 (-30.77%)
Mutual labels:  reinforcement-learning-environments
drl grasping
Deep Reinforcement Learning for Robotic Grasping from Octrees
Stars: ✭ 160 (+515.38%)
Mutual labels:  openai-gym
mdp
Make it easy to specify simple MDPs that are compatible with the OpenAI Gym.
Stars: ✭ 30 (+15.38%)
Mutual labels:  openai-gym
distributed rl
Pytorch implementation of distributed deep reinforcement learning
Stars: ✭ 66 (+153.85%)
Mutual labels:  openai-gym
CartPole
Run OpenAI Gym on a Server
Stars: ✭ 16 (-38.46%)
Mutual labels:  openai-gym
rl-bigwatermelon
用深度强化学习玩合成大西瓜
Stars: ✭ 22 (-15.38%)
Mutual labels:  reinforcement-learning-environments
build ionic2 app chinese
this is the chinese version of <build ionic2 app chinese>
Stars: ✭ 16 (-38.46%)
Mutual labels:  hybrid
TAT-QA
TAT-QA (Tabular And Textual dataset for Question Answering) contains 16,552 questions associated with 2,757 hybrid contexts from real-world financial reports.
Stars: ✭ 40 (+53.85%)
Mutual labels:  hybrid
jak
Hybrid web/desktop applications on Linux
Stars: ✭ 79 (+203.85%)
Mutual labels:  hybrid
magical
The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)
Stars: ✭ 60 (+130.77%)
Mutual labels:  reinforcement-learning-environments

gym-hybrid

Repository containing a collection of environment for reinforcement learning task possessing discrete-continuous hybrid action space.

"Sliding-v0" and "Moving-v0"

"Moving-v0" and "Sliding-v0" are sandbox environments for parameterized action-space algorithms. The goal of the agent is to stop inside a target area.

The field is a square with a side length of 2. The target area is a circle with radius 0.1. There is three discrete actions: turn, accelerate, and break. In addition to the action, there is 2 possible complementary parameters: acceleration and rotation.

The episode terminates if one of the three condition is filled:

  • the agent stop inside the target area,
  • the agent leaves the field,
  • the step count is higher than the limit (set by default at 200).

The moving environment doesn't take into account the conservation of inertia, while the sliding environment does. Sliding-v0 is therefore more realistic than Moving-v0.

All the parameters, actions, states and rewards are the same between the two environments. Only the underlying physics changes.

State

The state is constituted of a list of 10 elements. The environment related values are: the current step divided by the maximum step, and the position of the target (x and y). The player related values are the position (x and y), the speed, the direction (cosine and sine), the distance related to the target, and an indicator that becomes 1 if the player is inside the target zone.

state = [
    agent.x,
    agent.y,
    agent.speed,
    np.cos(agent.theta),
    np.sin(agent.theta),
    target.x,
    target.y,
    distance,
    0 if distance > target_radius else 1,
    current_step / max_step
]

Reward

The reward is the distance of the agent from the target of the last step minus the current distance. There is a penalty (set by default at a low value) to incentivize the learning algorithm to score as quickly as possible. A bonus reward of one is added if the player achieve to stop inside the target area. A malus of one is applied if the step count exceed the limit or if the player leaves the field.

Actions

The action ids are:

  1. Accelerate
  2. Turn
  3. Break

The parameters are:

  1. Acceleration value
  2. Rotation value

There is two distinct way to format an action:

Action with all the parameters (convenient if the model output all the parameters):

action = (action_id, [acceleration_value, rotation_value])

Example of a valid actions:

action = (0, [0.1, 0.4])
action = (1, [0.0, 0.2])
action = (2, [0.1, 0.3])

Note: Only the parameter related to the action chosen will be used.

Action with only the parameter related to the action id (convenient for algorithms that output only the parameter of the chosen action, since it doesn't require to pad the action):

action = (0, [acceleration_value])
action = (1, [rotation_value])
action = (2, [])

Example of valid actions:

action = (0, [0.1])
action = (1, [0.2])
action = (2, [])

Basics

Make and initialize an environment:

import gym
import gym_parametrized

sliding_env = gym.make('Sliding-v0')
sliding_env.reset()

moving_env = gym.make('Moving-v0')
moving_env.reset()

Get the action space and the observation space:

ACTION_SPACE = env.action_space[0].n
PARAMETERS_SPACE = env.action_space[1].shape[0]
OBSERVATION_SPACE = env.observation_space.shape[0]

Run a random agent:

done = False
while not done:
    state, reward, done, info = env.step(env.action_space.sample())
    print(f'State: {state} Reward: {reward} Done: {done}')

Parameters

The parameter that can be modified during the initialization are:

  • seed (default = None)
  • max_turn, angle in radi that can be achieved in one step (default = np.pi/2)
  • max_acceleration, acceleration that can be achieved in one step (if the input parameter is 1) (default = 0.5)
  • delta_t, time step of one step (default = 0.005)
  • max_step, limit of the number of step before the end of an environment (default = 200)
  • penalty, value substracted to the reward each step to incentivise the agent to finish the environment quicker (default = 0.001)

Initialization with custom parameters:

env = gym.make(
    'Moving-v0', 
    seed=0, 
    max_turn=1,
    max_acceleration=1.0, 
    delta_t=0.001, 
    max_step=500, 
    penalty=0.01
)

Render & Recording

Two testing files are avalaible to show users how to render and record the environment:

Disclaimer

Even though the mechanics of the environment are done, maybe the hyperparameters will need some further adjustments.

Reference

This environment is described in several papers such as:
Parametrized Deep Q-Networks Learning, Xiong et al., 2018
Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space, Fan et al., 2019

Installation

Direct Installation from github using pip by running this command:

pip install git+https://github.com/thomashirtz/gym-hybrid#egg=gym-hybrid
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].