All Projects → utiasDSL → safe-control-gym

utiasDSL / safe-control-gym

Licence: MIT license
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to safe-control-gym

QuadrotorFly
This is a dynamic simulation for quadrotor UAV
Stars: ✭ 61 (-77.57%)
Mutual labels:  control, quadcopter, quadrotor
Quadcopter SimCon
Quadcopter Simulation and Control. Dynamics generated with PyDy.
Stars: ✭ 84 (-69.12%)
Mutual labels:  control, quadcopter
proto
Proto-RL: Reinforcement Learning with Prototypical Representations
Stars: ✭ 67 (-75.37%)
Mutual labels:  control, gym
CartPole
Run OpenAI Gym on a Server
Stars: ✭ 16 (-94.12%)
Mutual labels:  gym, cartpole
Deepdrive
Deepdrive is a simulator that allows anyone with a PC to push the state-of-the-art in self-driving
Stars: ✭ 628 (+130.88%)
Mutual labels:  control, gym
kr mav control
Code for quadrotor control
Stars: ✭ 31 (-88.6%)
Mutual labels:  control, quadrotor
Drq
DrQ: Data regularized Q
Stars: ✭ 268 (-1.47%)
Mutual labels:  control, gym
Gym Pybullet Drones
PyBullet Gym environments for single and multi-agent reinforcement learning of quadcopter control
Stars: ✭ 168 (-38.24%)
Mutual labels:  control, quadcopter
RaspberryPilot
RaspberryPilot
Stars: ✭ 31 (-88.6%)
Mutual labels:  quadcopter
NeuroMechFly
A neuromechanical model of adult Drosophila melanogaster.
Stars: ✭ 29 (-89.34%)
Mutual labels:  pybullet
MetaGym
Collection of Reinforcement Learning / Meta Reinforcement Learning Environments.
Stars: ✭ 222 (-18.38%)
Mutual labels:  quadrotor
MsgBox
可携带附加消息的增强型消息框
Stars: ✭ 41 (-84.93%)
Mutual labels:  control
GoBigger
Come & try Decision-Intelligence version of "Agar"! Gobigger could also help you with multi-agent decision intelligence study.
Stars: ✭ 410 (+50.74%)
Mutual labels:  gym
guardian
Guardian is a tool for extensible and universal data access with automated access workflows and security controls across data stores, analytical systems, and cloud products.
Stars: ✭ 127 (-53.31%)
Mutual labels:  control
RAScrollablePickerView
Lightweight HSB color picker view.
Stars: ✭ 39 (-85.66%)
Mutual labels:  control
pybullet ros
A bridge between ROS and PyBullet
Stars: ✭ 88 (-67.65%)
Mutual labels:  pybullet
human robot collaboration
Yet another repo for the baxter collaboration task.
Stars: ✭ 18 (-93.38%)
Mutual labels:  safety
CIL-ReID
Benchmarks for Corruption Invariant Person Re-identification. [NeurIPS 2021 Track on Datasets and Benchmarks]
Stars: ✭ 71 (-73.9%)
Mutual labels:  robustness
reinforcement learning course materials
Lecture notes, tutorial tasks including solutions as well as online videos for the reinforcement learning course hosted by Paderborn University
Stars: ✭ 765 (+181.25%)
Mutual labels:  control
sysdweb
Control systemd services through Web or REST API
Stars: ✭ 65 (-76.1%)
Mutual labels:  control

safe-control-gym

Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-based control, and model-free and model-based reinforcement learning (RL).

These environments include (and evaluate) symbolic safety constraints and implement input, parameter, and dynamics disturbances to test the robustness and generalizability of control approaches. [PDF]

problem illustration

@article{brunke2021safe,
         title={Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning}, 
         author={Lukas Brunke and Melissa Greeff and Adam W. Hall and Zhaocong Yuan and Siqi Zhou and Jacopo Panerati and Angela P. Schoellig},
         journal = {Annual Review of Control, Robotics, and Autonomous Systems},
         year={2021},
         url = {https://arxiv.org/abs/2108.06266}}

baselines

Install on Ubuntu/macOS

Clone repo

git clone https://github.com/utiasDSL/safe-control-gym.git
cd safe-control-gym

Option A (recommended): using conda

Create and access a Python 3.8 environment using conda

conda create -n safe python=3.8.10
conda activate safe

Install the safe-control-gym repository

pip install --upgrade pip
pip install -e .

Option B: using venv and poetry

Create and access a Python 3.8 virtual environment using pyenv and venv

pyenv install 3.8.10
pyenv local 3.8.10
python3 -m venv safe
source safe/bin/activate
pip install --upgrade pip
pip install poetry
poetry install

Note:

You may need to separately install gmp, a dependency of pycddlib:

conda install -c anaconda gmp

or

sudo apt-get install libgmp-dev

Option C: using Colab

See this notebook where safe-control-gym is pre-installed

Architecture

Overview of safe-control-gym's API:

block diagram

@misc{yuan2021safecontrolgym,
      title={safe-control-gym: a Unified Benchmark Suite for Safe Learning-based Control and Reinforcement Learning}, 
      author={Zhaocong Yuan and Adam W. Hall and Siqi Zhou and Lukas Brunke and Melissa Greeff and Jacopo Panerati and Angela P. Schoellig},
      year={2021},
      eprint={2109.06325},
      archivePrefix={arXiv},
      primaryClass={cs.RO}}

Configuration

config

Performance

We compare the sample efficiency of safe-control-gym with the original [OpenAI Cartpole][1] and [PyBullet Gym's Inverted Pendulum][2], as well as [gym-pybullet-drones][3]. We choose the default physic simulation integration step of each project. We report performance results for open-loop, random action inputs. Note that the Bullet engine frequency reported for safe-control-gym is typically much finer grained for improved fidelity. safe-control-gym quadrotor environment is not as light-weight as [gym-pybullet-drones][3] but provides the same order of magnitude speed-up and several more safety features/symbolic models.

Environment GUI Control Freq. PyBullet Freq. Constraints & Disturbances^ Speed-Up^^
Gym cartpole True 50Hz N/A No 1.16x
InvPenPyBulletEnv False 60Hz 60Hz No 158.29x
cartpole True 50Hz 50Hz No 0.85x
cartpole False 50Hz 1000Hz No 24.73x
cartpole False 50Hz 1000Hz Yes 22.39x
gym-pyb-drones True 48Hz 240Hz No 2.43x
gym-pyb-drones False 50Hz 1000Hz No 21.50x
quadrotor True 60Hz 240Hz No 0.74x
quadrotor False 50Hz 1000Hz No 9.28x
quadrotor False 50Hz 1000Hz Yes 7.62x

^ Whether the environment includes a default set of constraints and disturbances

^^ Speed-up = Elapsed Simulation Time / Elapsed Wall Clock Time; on a 2.30GHz Quad-Core i7-1068NG7 with 32GB 3733MHz LPDDR4X; no GPU

Getting Started

Familiarize with APIs and environments with the scripts in examples/

$ cd ./examples/                                                                    # Navigate to the examples folder
$ python3 tracking.py --overrides ./tracking.yaml                                   # PID trajectory tracking with the 2D quadcopter
$ python3 verbose_api.py --task cartpole --overrides verbose_api.yaml             #  Printout of the extended safe-control-gym APIs

Systems Variables and 2D Quadrotor Lemniscate Trajectory Tracking

systems trajectory

Verbose API Example

List of Implemented Controllers

Re-create the Results in "Safe Learning in Robotics" [arXiv link]

To stay in touch, get involved or ask questions, please open an issue on GitHub or contact us via e-mail ({jacopo.panerati, zhaocong.yuan, adam.hall, siqi.zhou, lukas.brunke, melissa.greeff}@robotics.utias.utoronto.ca).

Figure 6—Robust GP-MPC [1]

$ cd ../experiments/annual_reviews/figure6/                        # Navigate to the experiment folder
$ chmod +x create_fig6.sh                                          # Make the script executable, if needed
$ ./create_fig6.sh                                                 # Run the script (ca. 2')

This will use the models in safe-control-gym/experiments/figure6/trained_gp_model/ to generate

gp-mpc

To also re-train the GP models from scratch (ca. 30' on a laptop)

$ chmod +x create_trained_gp_model.sh                              # Make the script executable, if needed
$ ./create_trained_gp_model.sh                                     # Run the script (ca. 30')

Note: this will backup and overwrite safe-control-gym/experiments/figure6/trained_gp_model/


Figure 7—Safe RL Exploration [2]

$ cd ../figure7/                                                   # Navigate to the experiment folder
$ chmod +x create_fig7.sh                                          # Make the script executable, if needed
$ ./create_fig7.sh                                                 # Run the script (ca. 5'')

This will use the data in safe-control-gym/experiments/figure7/safe_exp_results.zip/ to generate

safe-exp

To also re-train all the controllers/agents (warning: >24hrs on a laptop, if necessary, run each one of the loops in the Bash script—PPO, PPO with reward shaping, and the Safe Explorer—separately)

$ chmod +x create_safe_exp_results.sh                              # Make the script executable, if needed
$ ./create_safe_exp_results.sh                                     # Run the script (>24hrs)

Note: this script will (over)write the results in safe-control-gym/experiments/figure7/safe_exp_results/; if you do not run the re-training to completion, delete the partial results rm -r -f ./safe_exp_results/ before running ./create_fig7.sh again.


Figure 8—Model Predictive Safety Certification [3]

(required) Obtain MOSEK's license (free for academia). Once you have received (via e-mail) and downloaded the license to your own ~/Downloads folder, install it by executing

$ mkdir ~/mosek                                                    # Create MOSEK license folder in your home '~'
$ mv ~/Downloads/mosek.lic ~/mosek/                                # Copy the downloaded MOSEK license to '~/mosek/'

Then run

$ cd ../figure8/                                                   # Navigate to the experiment folder
$ chmod +x create_fig8.sh                                          # Make the script executable, if needed
$ ./create_fig8.sh                                                 # Run the script (ca. 1')

This will use the unsafe (pre-trained) PPO controller/agent in folder safe-control-gym/experiments/figure8/unsafe_ppo_model/ to generate

mpsc-1

mpsc-2 mpsc-3

To also re-train the unsafe PPO controller/agent (ca. 2' on a laptop)

$ chmod +x create_unsafe_ppo_model.sh                              # Make the script executable, if needed
$ ./create_unsafe_ppo_model.sh                                     # Run the script (ca. 2')

Note: this script will (over)write the model in safe-control-gym/experiments/figure8/unsafe_ppo_model/

References

Related Open-source Projects

TODOs (August 2022)

  • Publish to PyPI
  • Create resource list with papers, projects, blog posts (Cat's, etc.) using safe-control-gym

University of Toronto's Dynamic Systems Lab / Vector Institute for Artificial Intelligence

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].