All Projects → huawei-noah → xingtian

huawei-noah / xingtian

Licence: MIT License
xingtian is a componentized library for the development and verification of reinforcement learning algorithms

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to xingtian

Deep Reinforcement Learning
Repo for the Deep Reinforcement Learning Nanodegree program
Stars: ✭ 4,012 (+1651.97%)
Mutual labels:  dqn, reinforcement-learning-algorithms, ppo
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (+1.75%)
Mutual labels:  dqn, ppo
TF2-RL
Reinforcement learning algorithms implemented for Tensorflow 2.0+ [DQN, DDPG, AE-DDPG, SAC, PPO, Primal-Dual DDPG]
Stars: ✭ 160 (-30.13%)
Mutual labels:  dqn, ppo
ReinforcementLearningZoo.jl
juliareinforcementlearning.org/
Stars: ✭ 46 (-79.91%)
Mutual labels:  dqn, ppo
Deep-rl-mxnet
Mxnet implementation of Deep Reinforcement Learning papers, such as DQN, PG, DDPG, PPO
Stars: ✭ 26 (-88.65%)
Mutual labels:  dqn, reinforcement-learning-algorithms
Machine Learning Is All You Need
🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!
Stars: ✭ 173 (-24.45%)
Mutual labels:  dqn, ppo
Reinforcement Learning
Deep Reinforcement Learning Algorithms implemented with Tensorflow 2.3
Stars: ✭ 61 (-73.36%)
Mutual labels:  reinforcement-learning-algorithms, ppo
Ros2learn
ROS 2 enabled Machine Learning algorithms
Stars: ✭ 119 (-48.03%)
Mutual labels:  dqn, ppo
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (-3.06%)
Mutual labels:  dqn, ppo
UAV-DDPG
Code for paper "Computation Offloading Optimization for UAV-assisted Mobile Edge Computing: A Deep Deterministic Policy Gradient Approach"
Stars: ✭ 133 (-41.92%)
Mutual labels:  dqn, reinforcement-learning-algorithms
Rainy
☔ Deep RL agents with PyTorch☔
Stars: ✭ 39 (-82.97%)
Mutual labels:  dqn, ppo
Deep Reinforcement Learning Algorithms
31 projects in the framework of Deep Reinforcement Learning algorithms: Q-learning, DQN, PPO, DDPG, TD3, SAC, A2C and others. Each project is provided with a detailed training log.
Stars: ✭ 167 (-27.07%)
Mutual labels:  dqn, ppo
Minimalrl
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
Stars: ✭ 2,051 (+795.63%)
Mutual labels:  dqn, ppo
Deeprl
Modularized Implementation of Deep RL Algorithms in PyTorch
Stars: ✭ 2,640 (+1052.84%)
Mutual labels:  dqn, ppo
Machin
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
Stars: ✭ 145 (-36.68%)
Mutual labels:  dqn, ppo
Tianshou
An elegant PyTorch deep reinforcement learning library.
Stars: ✭ 4,109 (+1694.32%)
Mutual labels:  dqn, ppo
Explorer
Explorer is a PyTorch reinforcement learning framework for exploring new ideas.
Stars: ✭ 54 (-76.42%)
Mutual labels:  dqn, ppo
Deep Reinforcement Learning With Pytorch
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Stars: ✭ 1,345 (+487.34%)
Mutual labels:  dqn, ppo
Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+1211.79%)
Mutual labels:  dqn, ppo
king-pong
Deep Reinforcement Learning Pong Agent, King Pong, he's the best
Stars: ✭ 23 (-89.96%)
Mutual labels:  dqn, reinforcement-learning-algorithms

中文

Introduction

License: MIT

XingTian (刑天) is a componentized library for the development and verification of reinforcement learning algorithms. It supports multiple algorithms, including DQN, DDPG, PPO, and IMPALA etc, which could training agents in multiple environments, such as Gym, Atari, Torcs, StarCraftII and so on. To meet users' requirements for quick verification and solving RL problems, four modules are abstracted: Algorithm, Model, Agent, and Environment. They work in a similar way as the combination of `Lego' building blocks. For details about the architecture, please see the Architecture introduction.

Dependencies

# ubuntu 18.04
sudo apt-get install python3-pip libopencv-dev -y
pip3 install opencv-python

# run with tensorflow 1.15.0 or tensorflow 2.3.1
pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 absl-py psutil tensorboardX setproctitle

or, using pip3 install -r requirements.txt

If your want to used PyTorch as the backend, please install it by yourself. Ref Pytorch

Installation

# cd PATH/TO/XingTian 
pip3 install -e .

After installation, you could use import xt; print(xt.__Version__) to check whether the installation is successful.

In [1]: import xt

In [2]: xt.__version__
Out[2]: '0.3.0'

Quick Start


Setup configuration

Follow's configuration shows a minimal example with Cartpole environment. More detailed description with the parameters of agent, algorithm and environment could been find in the User guide .

alg_para:
  alg_name: PPO
  alg_config:
    process_num: 1
    save_model: True  # default False
    save_interval: 100

env_para:
  env_name: GymEnv
  env_info:
    name: CartPole-v0
    vision: False

agent_para:
  agent_name: PPO
  agent_num : 1
  agent_config:
    max_steps: 200
    complete_step: 1000000
    complete_episode: 3550

model_para:
  actor:
    model_name: PpoMlp
    state_dim: [4]
    action_dim: 2
    input_dtype: float32
    model_config:
      BATCH_SIZE: 200
      CRITIC_LOSS_COEF: 1.0
      ENTROPY_LOSS: 0.01
      LR: 0.0003
      LOSS_CLIPPING: 0.2
      MAX_GRAD_NORM: 5.0
      NUM_SGD_ITER: 8
      SUMMARY: False
      VF_SHARE_LAYERS: False
      activation: tanh
      hidden_sizes: [64, 64]

env_num: 10

In addition, your could find more configuration sets in examples directory.

Start training task

python3 xt/main.py -f examples/cartpole_ppo.yaml -t train

img

Evaluate local trained model

Set benchmark.eval.model_path for evaluation within the YOUR_CONFIG_FILE.yaml

benchmark:
  eval:
    model_path: /YOUR/PATH/TO/EVAL/models
    gap: 10           # index gap of eval model
    evaluator_num: 1  # the number of evaluator instance

# run command
python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate

NOTE: XingTian start with -t train as default.

Run with CLI

# Could replace `python3 xt/main.py` with `xt_main` command!
xt_main -f examples/cartpole_ppo.yaml -t train

# train with evaluate
xt_main -f examples/cartpole_ppo.yaml -t train_with_evaluate

Develop with Custom case

  1. Write custom module, and register it. More detail guidance on custom module can be found in the Developer Guide
  2. Add YOUR-CUSTOM-MODULE name into your_train_configure.yaml
  3. Start training with xt_main -f path/to/your_train_configure.yaml :)

Reference Results

Episode Reward Average

  1. DQN Reward after 10M time-steps (40M frames).

    env XingTian Basic DQN RLlib Basic DQN Hessel et al. DQN
    BeamRider 6706 2869 ~2000
    Breakout 352 287 ~150
    QBert 14087 3921 ~4000
    SpaceInvaders 947 650 ~500
  2. PPO Reward after 10M time-steps (40M frames).

    env XingTian PPO RLlib PPO Baselines PPO
    BeamRider 4877 2807 ~1800
    Breakout 341 104 ~250
    QBert 14771 11085 ~14000
    SpaceInvaders 1025 671 ~800
  3. IMPALA Reward after 10M time-steps (40M frames).

    env XingTian IMPALA RLlib IMPALA
    BeamRider 2313 2071
    Breakout 334 385
    QBert 12205 4068
    SpaceInvaders 742 719

Throughput

  1. DQN

    env XingTian Basic DQN RLlib Basic DQN
    BeamRider 129 109
    Breakout 117 113
    QBert 111 90
    SpaceInvaders 115 100
  2. PPO

    env XingTian PPO RLlib PPO
    BeamRider 2422 1618
    Breakout 2497 1535
    QBert 2436 1617
    SpaceInvaders 2438 1608
  3. IMPALA

    env XingTian IMPALA RLlib IMPALA
    BeamRider 8756 3637
    Breakout 8814 3525
    QBert 8249 3471
    SpaceInvaders 8463 3555

Experiment condition: 72 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz with single Tesla V100

Ray's reward data come from https://github.com/ray-project/rl-experiments, and Throughout from ray 0.8.6 with the same machine condition.

Acknowledgement

XingTian refers to the following projects: DeepMind/scalable_agent, baselines, ray.

License

The MIT License(MIT)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].