All Projects → StepNeverStop → Rls

StepNeverStop / Rls

Licence: apache-2.0
Reinforcement Learning Algorithms Based on TensorFlow 2.x

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rls

Openvhead
A 3D virtual head control system for VTuber in Unity with smooth motion and robust facial expressions
Stars: ✭ 221 (-7.53%)
Mutual labels:  unity3d
Apple Signin Unity
Unity plugin to support Sign In With Apple Id
Stars: ✭ 228 (-4.6%)
Mutual labels:  unity3d
Js
turbo.js - perform massive parallel computations in your browser with GPGPU.
Stars: ✭ 2,591 (+984.1%)
Mutual labels:  parallel
Urmotion
Flexible motion engine for non time-based animation in Unity.
Stars: ✭ 220 (-7.95%)
Mutual labels:  unity3d
Transducers.jl
Efficient transducers for Julia
Stars: ✭ 226 (-5.44%)
Mutual labels:  parallel
Nodulus
Puzzle game with clever twists (Unity3d)
Stars: ✭ 232 (-2.93%)
Mutual labels:  unity3d
Mathutilities
A collection of some of the neat math and physics tricks that I've collected over the last few years.
Stars: ✭ 2,815 (+1077.82%)
Mutual labels:  unity3d
Roboleague
A car soccer environment inspired by Rocket League for deep reinforcement learning experiments in an adversarial self-play setting.
Stars: ✭ 236 (-1.26%)
Mutual labels:  unity3d
Catlib
CatLib for unity3d dependency injection framework
Stars: ✭ 228 (-4.6%)
Mutual labels:  unity3d
Ocaramba
C# Framework to automate tests using Selenium WebDriver
Stars: ✭ 234 (-2.09%)
Mutual labels:  parallel
Loadjs
A tiny async loader / dependency manager for modern browsers (899 bytes)
Stars: ✭ 2,507 (+948.95%)
Mutual labels:  parallel
Ma Gym
A collection of multi agent environments based on OpenAI gym.
Stars: ✭ 226 (-5.44%)
Mutual labels:  gym
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (-2.93%)
Mutual labels:  parallel
Xrtk Core
The Official Mixed Reality Framework for Unity
Stars: ✭ 219 (-8.37%)
Mutual labels:  unity3d
Kk Hf patch
Automatically translate, uncensor and update Koikatu! and Koikatsu Party!
Stars: ✭ 233 (-2.51%)
Mutual labels:  unity3d
Openseeface
Robust realtime face and facial landmark tracking on CPU with Unity integration
Stars: ✭ 216 (-9.62%)
Mutual labels:  unity3d
Dear Imgui Unity
Unity package for Dear ImGui
Stars: ✭ 230 (-3.77%)
Mutual labels:  unity3d
Hololensartoolkit
Marker tracking using the front-facing camera of HoloLens (both 1 and 2) and Unity, with a wrapper of ARToolKit built for UWP (Windows Universal Platform)
Stars: ✭ 238 (-0.42%)
Mutual labels:  unity3d
Unitygoogledrive
Google Drive SDK for Unity game engine
Stars: ✭ 236 (-1.26%)
Mutual labels:  unity3d
Hololenscamerastream
This Unity plugin makes the HoloLens video camera frames available to a Unity app in real time. This enables Unity devs to easily use the HoloLens camera for computer vision (or anything they want).
Stars: ✭ 233 (-2.51%)
Mutual labels:  unity3d

RLs: Reinforcement Learning Algorithm Based On TensorFlow 2.x.

This project includes SOTA or classic RL(reinforcement learning) algorithms used for training agents by interacting with Unity through ml-agents Release 12 or with gym. The goal of this framework is to provide stable implementations of standard RL algorithms and simultaneously enable fast prototyping of new methods.

About

It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).

Characteristics

  • Suitable for Windows, Linux, and OSX
  • Almost reimplementation and competitive performance of original papers
  • Reusable modules
  • Clear hierarchical structure and easy code control
  • Compatible with OpenAI Gym and Unity3D Ml-agents
  • Restoring the training process from where it stopped, retraining on a new task, fine-tuning
  • Using other training task's model as parameter initialization, specifying --load

Supports

This project supports:

  • Unity3D ml-agents.
  • Gym{MuJoCo, PyBullet, gym_minigrid}, for now only two data types are compatible——[Box, Discrete]. Support 99.65% environment settings of Gym(except Blackjack-v0, KellyCoinflip-v0, and KellyCoinflipGeneralized-v0). Support parallel training using gym envs, just need to specify --copys to how many agents you want to train in parallel.
    • Discrete -> Discrete (observation type -> action type)
    • Discrete -> Box
    • Box -> Discrete
    • Box -> Box
    • Box/Discrete -> Tuple(Discrete, Discrete, Discrete)
  • MultiAgent training. One group controls multiple agents.
  • MultiBrain training. Brains' model should be same algorithm or have the same learning-progress(perStep or perEpisode).
  • MultiImage input(only for ml-agents). Images will resized to same shape before store into replay buffer, like [84, 84, 3].
  • Four types of Replay Buffer, Default is ER:
  • Noisy Net for better exploration.
  • Intrinsic Curiosity Module for almost all off-policy algorithms implemented.

Advantages

  • Parallel training multiple scenes for Gym
  • Unified data format of environments between ml-agents and gym
  • Just need to write a single file for other algorithms' implementation(Similar algorithm structure).
  • Many controllable factors and adjustable parameters

Installation

method 1:

conda env create -f environment.yaml

method 2:

$ git clone https://github.com/StepNeverStop/RLs.git
$ cd RLs
$ conda create -n rls python=3.6
$ conda activate rls
# Windows
$ pip install -e .[windows]
# Linux or Mac OS
$ pip install -e .

If using ml-agents:

$ pip install -e .[unity]

If using atari:

$ pip install -e .[atari]

You can download the builded docker image from here:

$ docker pull keavnn/rls:latest

Implemented Algorithms

For now, these algorithms are available:

Algorithms(29) Discrete Continuous Image RNN Command parameter
Q-Learning/Sarsa/Expected Sarsa qs
PG pg
AC ac
A2C a2c
TRPO trpo
PPO ppo
DQN dqn
Double DQN ddqn
Dueling Double DQN dddqn
Averaged DQN averaged_dqn
Bootstrapped DQN bootstrappeddqn
Soft Q-Learning sql
C51 c51
QR-DQN qrdqn
IQN iqn
Rainbow rainbow
DPG dpg
DDPG ddpg
PD-DDPG pd_ddpg
TD3 td3
SAC(has V network) sac_v
SAC sac
TAC sac tac
MaxSQN maxsqn
MADDPG maddpg
OC oc
AOC aoc
PPOC ppoc
IOC ioc
HIRO hiro
CURL curl

Getting started

"""
Usage:
    python [options]

Options:
    -h,--help                   显示帮助
    -a,--algorithm=<name>       算法
                                specify the training algorithm [default: ppo]
    -c,--copys=<n>              指定并行训练的数量
                                nums of environment copys that collect data in parallel [default: 1]
    -e,--env=<file>             指定Unity环境路径
                                specify the path of builded training environment of UNITY3D [default: None]
    -g,--graphic                是否显示图形界面
                                whether show graphic interface when using UNITY3D [default: False]
    -i,--inference              推断
                                inference the trained model, not train policies [default: False]
    -m,--models=<n>             同时训练多少个模型
                                specify the number of trails that using different random seeds [default: 1]
    -n,--name=<name>            训练的名字
                                specify the name of this training task [default: None]
    -p,--port=<n>               端口
                                specify the port that communicate with training environment of UNITY3D [default: 5005]
    -r,--rnn                    是否使用RNN模型
                                whether use rnn[GRU, LSTM, ...] or not [default: False]
    -s,--save-frequency=<n>     保存频率
                                specify the interval that saving model checkpoint [default: None]
    -t,--train-step=<n>         总的训练次数
                                specify the training step that optimize the policy model [default: None]
    -u,--unity                  是否使用unity客户端
                                whether training with UNITY3D editor [default: False]
    
    --apex=<str>                i.e. "learner"/"worker"/"buffer"/"evaluator" [default: None]
    --unity-env=<name>          指定unity环境的名字
                                specify the name of training environment of UNITY3D [default: None]
    --config-file=<file>        指定模型的超参数config文件
                                specify the path of training configuration file [default: None]
    --store-dir=<file>          指定要保存模型、日志、数据的文件夹路径
                                specify the directory that store model, log and others [default: None]
    --seed=<n>                  指定训练器全局随机种子
                                specify the random seed of module random, numpy and tensorflow [default: 42]
    --unity-env-seed=<n>        指定unity环境的随机种子
                                specify the environment random seed of UNITY3D [default: 42]
    --max-step=<n>              每回合最大步长
                                specify the maximum step per episode [default: None]
    --train-episode=<n>         总的训练回合数
                                specify the training maximum episode [default: None]
    --train-frame=<n>           总的训练采样次数
                                specify the training maximum steps interacting with environment [default: None]
    --load=<name>               指定载入model的训练名称
                                specify the name of pre-trained model that need to load [default: None]
    --prefill-steps=<n>         指定预填充的经验数量
                                specify the number of experiences that should be collected before start training, use for off-policy algorithms [default: None]
    --prefill-choose            指定no_op操作时随机选择动作,或者置0
                                whether choose action using model or choose randomly [default: False]
    --gym                       是否使用gym训练环境
                                whether training with gym [default: False]
    --gym-env=<name>            指定gym环境的名字
                                specify the environment name of gym [default: CartPole-v0]
    --gym-env-seed=<n>          指定gym环境的随机种子
                                specify the environment random seed of gym [default: 42]
    --render-episode=<n>        指定gym环境从何时开始渲染
                                specify when to render the graphic interface of gym environment [default: None]
    --info=<str>                抒写该训练的描述,用双引号包裹
                                write another information that describe this training task [default: None]
    --hostname                  是否在训练名称后附加上主机名称
                                whether concatenate hostname with the training name [default: False]
    --no-save                   指定是否在训练中保存模型、日志及训练数据
                                specify whether save models/logs/summaries while training or not [default: False]
Example:
    gym:
        python run.py --gym -a dqn --gym-env CartPole-v0 -c 12 -n dqn_cartpole --no-save
    unity:
        python run.py -u -a ppo -n run_with_unity
        python run.py -e /root/env/3dball.app -a sac -n run_with_execution_file
"""

If you specify gym, unity, and environment executable file path simultaneously, the following priorities will be followed: gym > unity > unity_env.

Notes

  1. log, model, training parameter configuration, and data are stored in C:\RLData for Windows, or $HOME/RLData for Linux/OSX
  2. maybe need to use command su or sudo to run on a Linux/OSX
  3. record directory format is RLData/Environment/Algorithm/Behavior name(for ml-agents)/Training name/config&log&model
  4. make sure brains' number > 1 if specifying ma* algorithms like maddpg
  5. multi-agents algorithms doesn't support visual input and PER for now
  6. need 3 steps to implement a new algorithm
    1. write .py in rls/algos/{single/multi/hierarchical} directory and make the policy inherit from class Policy, On_Policy, Off_Policy or other super-class defined in rls/algos/base
    2. write default configuration in rls/algos/config.yaml
    3. register new algorithm at dictionary algos in rls/algos/__init__.py, make sure the class name matches the name of the algorithm class
  7. set algorithms' hyper-parameters in rls/algos/config.yaml
  8. set training default configuration in config.yaml
  9. change neural network structure in rls/nn/models.py
  10. MADDPG is only suitable for Unity3D ML-Agents for now. behavior name in training scene should be set like {agents control nums of this group per environment copy}#{bahevior_name}, i.e. 2#3DBallAgents means one group/team controls two same agents in one environment copy.

Ongoing things

  • DARQN
  • ACER
  • Ape-X
  • R2D2
  • ACKTR

Giving credit

If using this repository for your research, please cite:

@misc{RLs,
  author = {Keavnn},
  title = {RLs: Reinforcement Learning research framework for Unity3D and Gym},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/StepNeverStop/RLs}},
}

Issues

Any questions/errors about this project, please let me know in here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].