Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → cyoon1729 → Rlcycle

cyoon1729 / Rlcycle

Licence: mit

A library for ready-made reinforcement learning agents and reusable components for neat prototyping

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch reinforcement-learning dqn a3c ddpg hydra

Projects that are alternatives of or similar to Rlcycle

Deep Rl Keras

Keras Implementation of popular Deep RL Algorithms (A3C, DDQN, DDPG, Dueling DDQN)

Stars: ✭ 395 (+114.67%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Reinforcement Learning With Tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Stars: ✭ 6,948 (+3676.09%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Minimalrl

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Stars: ✭ 2,051 (+1014.67%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Deeprl Tensorflow2

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

Stars: ✭ 319 (+73.37%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Easy Rl

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+1532.61%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Stars: ✭ 145 (-21.2%)

Mutual labels: reinforcement-learning, dqn, a3c, ddpg

Autonomous Learning Library

A PyTorch library for building deep reinforcement learning agents.

Stars: ✭ 425 (+130.98%)

Mutual labels: reinforcement-learning, dqn, ddpg

Mushroom Rl

Python library for Reinforcement Learning.

Stars: ✭ 442 (+140.22%)

Mutual labels: reinforcement-learning, dqn, ddpg

Reinforcement Learning

🤖 Implements of Reinforcement Learning algorithms.

Stars: ✭ 104 (-43.48%)

Mutual labels: reinforcement-learning, dqn, ddpg

Tensorlayer

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

Stars: ✭ 6,796 (+3593.48%)

Mutual labels: reinforcement-learning, dqn, a3c

Deep Reinforcement Learning

Repo for the Deep Reinforcement Learning Nanodegree program

Stars: ✭ 4,012 (+2080.43%)

Mutual labels: reinforcement-learning, dqn, ddpg

Pytorch Rl

Deep Reinforcement Learning with pytorch & visdom

Stars: ✭ 745 (+304.89%)

Mutual labels: reinforcement-learning, dqn, a3c

Tensorflow Rl

Implementations of deep RL papers and random experimentation

Stars: ✭ 176 (-4.35%)

Mutual labels: reinforcement-learning, dqn, a3c

Elegantrl

Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch.

Stars: ✭ 575 (+212.5%)

Mutual labels: reinforcement-learning, dqn, ddpg

Pytorch Rl

This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch

Stars: ✭ 394 (+114.13%)

Mutual labels: reinforcement-learning, dqn, ddpg

Policy Gradient Methods

Implementation of Algorithms from the Policy Gradient Family. Currently includes: A2C, A3C, DDPG, TD3, SAC

Stars: ✭ 54 (-70.65%)

Mutual labels: reinforcement-learning, a3c, ddpg

Reinforcement learning

Reinforcement learning tutorials

Stars: ✭ 82 (-55.43%)

Mutual labels: reinforcement-learning, dqn, a3c

Rl4j

Deep Reinforcement Learning for the JVM (Deep-Q, A3C)

Stars: ✭ 330 (+79.35%)

Mutual labels: reinforcement-learning, dqn, a3c

Ai Blog

Accompanying repository for Let's make a DQN / A3C series.

Stars: ✭ 351 (+90.76%)

Mutual labels: reinforcement-learning, dqn, a3c

Slm Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

Stars: ✭ 904 (+391.3%)

Mutual labels: reinforcement-learning, dqn, a3c

View All Similar Projects ➔

RLcycle

RLcycle (pronounced as "recycle") is a reinforcement learning (RL) agents framework. RLcycle provides ready-made RL agents, as well as reusable components for easy prototyping.

Currently, RLcycle provides:

DQN + enhancements, Distributional: C51, Quantile Regression, Rainbow-DQN.
Noisy Networks for parameter space noise
A2C (data parallel) and A3C (gradient parallel).
DDPG, both Lillicrap et al. (2015) and Fujimoto et al., (2018) versions.
Soft Actor Critic with automatic entropy coefficient tuning.
Prioritized Experience Replay and n-step updates for all off-policy algorithms.

RLcycle uses:

PyTorch for computations and building and optimizing models.
Hydra for configuring and building agents.
Ray for parallelizing learning.
WandB for logging training and testing.

See below for an introduction and guide to using RLcycle, performance benchmarks, and future plans.

Contributing

If you have any questions or suggestions, feel free to raise an issue or reach out at cjy2129 [at] columbia [dot] edu!

Getting Started

To install:

conda create --name myenv python=3.6.9 pip
conda activate myenv
git clone https://github.com/cyoon1729/RLcycle.git
cd RLcycle
pip install -U -r requirements.txt
pip install -e .

0. A quick look at Hydra for managing configurations

Let's first take a look at one of the many useful things Hydra can do:

"""Instantiating classes from yaml files"""

# in ./examples/rectangle.yaml
shape:
    class: examples.shapes.Rectangle
        params:
            height: 5
            width: 4

Initializing shapes.Rectangle with hydra as in the above yaml file:

"""Instantiating classes from yaml files"""

# in ./examples/shapes.py
class Rectangle:
    def __init__(self, width: float, height: float):
        self.width = width
        self.height = height
       
    def get_area(self):
        return width * height

# in ./examples/main.py
import hydra
from omegaconf import DictConfig

@hydra.main(config_path="./examples/rectangle.yaml")
def main(cfg: DictConfig):
    shape = hydra.utils.instantiate(layer_info)
    print(shape.__class__.__name__)  # 'Rectangle'
    print(shape.get_area()) # 20
    
if __main__ == "__main__":
    main()

If you would like to know more about Hydra, check their documentation out!

1. Running experiments

Run the run_agent.py file and specify the experiment configuration as below:

python run_agent.py configs=atari/rainbow_dqn

Alternatively, you can specify the configuration (yaml) file in metaconfig.yaml.

# in ./metaconfig.yaml
defaults:
 - configs=atari/rainbow_dqn

To modify experiment arguments or hyperparameters, you can add the flags as below:

python run_agent.py configs=atari/rainbow_dqn configs.experiment_info.env.name=AlienNoFrameskip-v4

python run_agent.py configs=atari/rainbow_dqn configs.hyper_params.batch_size=64

python run_agent.py configs=pybullet/sac configs.hyper_params.batch_size=64

2. Building configurations for RLcycle

Let's take `atari/rainbow_dqn.yaml` for example:

experiment_info:
  experiment_name: Rainbow DQN
  agent: rlcycle.dqn_base.agent.DQNBaseAgent
  learner: rlcycle.dqn_base.learner.DQNLearner
  loss: rlcycle.dqn_base.loss.CategoricalLoss
  action_selector: rlcycle.dqn_base.action_selector.CategoricalActionSelector
  device: cuda
  log_wandb: True

  # Environment info
  env:
    name: "PongNoFrameskip-v4"
    is_atari: True
    is_discrete: True
    frame_stack: True

  # Experiment default arguments:
  total_num_episodes: 5000
  test_interval: 100  # Test every 50 episodes
  test_num: 5  # Number of episodes to test during test phase
  render_train: False  # Render all episode steps during training
  render_test: True # Render tests

defaults:
  - hyper_params: rainbow
  - models: duelingC51

under experiment_info, we have the fundamental arguments for running RL experiments: which classes (agent, learner, loss) we want to use, and the gym evironment and experiment configurations.

The defaults points to rlcycle/configs/atari/hyper_params/rainbow.yaml for hyperparameters, and rlcycle/configs/atari/models/duelingC51.yaml for model configurations. Taking a closer look at these files, we have:

hyper_params:
  batch_size: 64
  replay_buffer_size: 100000 
  use_per: False
  per_alpha: 0.5  # PER alpha value
  per_beta: 0.4  # PER beta value
  per_beta_max: 1.0
  per_beta_total_steps: 300000

  # Exploration configs
  eps: 1.0  # epsilon-greedy exploration
  eps_final: 0.0  # minimum epsilon value for exploration
  max_exploration_frame: 100000  # eps = eps_final at most until # steps

  # Others
  update_starting_point: 40000 # update steps when buffer has # experiences stored
  gamma: 0.99
  tau: 0.005
  q_reg_coeff: 0.0
  gradient_clip: 10.0
  n_step: 3
  train_freq: 4

  # Optimizer
  learning_rate: 0.0000625 # 0.0003
  weight_decay: 0.0
  adam_eps: 0.00015

a pretty standard organization for RL experiment hyperparameters, and

model:
  class: rlcycle.common.models.value.DuelingCategoricalDQN
  params:
    model_cfg:
      state_dim: undefined
      action_dim: undefined
      num_atoms: 51
      v_min: -10
      v_max: 10

      use_conv: True
      use_noisy: True
      conv_features:
        feature1:
          class: rlcycle.common.models.layers.Conv2DLayer
          params:
             input_size: 4
             output_size: 32
             kernel_size: 8
             stride: 4
             activation_fn: relu
        feature2:
          class: rlcycle.common.models.layers.Conv2DLayer
          params:
             input_size: 32
             output_size: 64
             kernel_size: 4
             stride: 2
             activation_fn: relu
        feature3:
          class: rlcycle.common.models.layers.Conv2DLayer
          params:
             input_size: 64
             output_size: 64
             kernel_size: 3
             stride: 1
             activation_fn: relu   
             
      advantage:
        fc1:
          class: rlcycle.common.models.layers.FactorizedNoisyLinearLayer
          params: 
            input_size: undefined
            output_size: 512
            post_activation_fn: relu
        fc2:
          class: rlcycle.common.models.layers.FactorizedNoisyLinearLayer
          params: 
            input_size: 512
            output_size: undefined
            post_activation_fn: identity

      value:
        fc1:
          class: rlcycle.common.models.layers.FactorizedNoisyLinearLayer
          params: 
            input_size: undefined
            output_size: 512
            post_activation_fn: identity
        fc2:
          class: rlcycle.common.models.layers.FactorizedNoisyLinearLayer
          params: 
            input_size: 512
            output_size: 1
            post_activation_fn: identity

where we define the parameters for the model and each of its layers. Note that the fields with values "undefined" will be defined inside the respective python object. For a more simple model yaml configuration file, take a look at rlcycle/configs/atari/models/dqn.yaml.

Extra (Good to know): How RLcycle instantiates components (models, learners, agents, etc)

Most of the components in RLcycle are instantiated via hydra.utils.instantiate, as illustrated in the section above. Examples:

models:

"""in ./configs/lunarlander/models/dqn.yaml"""
model:
  class: rlcycle.common.models.value.DQNModel
  params:
    model_cfg:
      state_dim: undefined
      action_dim: undefined
      fc:
        input:
          class: rlcycle.common.models.layers.LinearLayer
          params: 
            input_size: undefined
            output_size: 128
            post_activation_fn: relu           
        hidden:
          hidden1:
            class: rlcycle.common.models.layers.LinearLayer
            params: 
              input_size: 128
              output_size: 128
              post_activation_fn: relu
          hidden2:
            class: rlcycle.common.models.layers.LinearLayer
            params: 
              input_size: 128
              output_size: 128
              post_activation_fn: relu
        output:
          class: rlcycle.common.models.layers.LinearLayer
          params:
            input_size: 128
            output_size: undefined
            post_activation_fn: identity

Building the above model:

"""in ./rlcycle/build.py"""
def build_model(model_cfg: DictConfig, device: torch.device):
    """Build model from DictConfigs via hydra.utils.instantiate()"""
    model = hydra.utils.instantiate(model_cfg)
    return model.to(device)

learners:

# in ./rlcycle/dqn_base/agent.py
#....
self.learner = build_learner(
    self.experiment_info, self.hyper_params, self.model_cfg
)
# ...

# in ./rlcycle/build.py
def build_learner(
    experiment_info: DictConfig, hyper_params: DictConfig, model: DictConfig
):
    """Build learner from DictConfigs via hydra.utils.instantiate()"""
    learner_cfg = DictConfig(dict())
    learner_cfg["class"] = experiment_info.learner
    learner_cfg["params"] = dict(
        experiment_info=experiment_info,
        hyper_params=hyper_params,
        model_cfg=model
    )
    learner = hydra.utils.instantiate(learner_cfg)
    return learner

agents:

# in ./tests/test_dqn.py
@hydra.main(config_path="../configs/lunarlander/dqn.yaml", strict=False)
def main(cfg: DictConfig):
    agent = build_agent(**cfg)
    agent.train()

# in ./rlcycle.build.py
def build_agent(
    experiment_info: DictConfig, hyper_params: DictConfig, model: DictConfig
):
    """Build agent from DictConfigs via hydra.utils.instantiate()"""
    agent_cfg = DictConfig(dict())
    agent_cfg["class"] = experiment_info.agent
    agent_cfg["params"] = dict(
        experiment_info=experiment_info,
        hyper_params=hyper_params,
        model_cfg=model
    )
    agent = hydra.utils.instantiate(agent_cfg)
    return agent

This should help you get started with building & running agents in RLcycle!

Benchmarks

Hyperparameters were not rigorously tuned; most follow those presented in the original papers, with some modifications to work around memory usage.

Click the dropdown below!

Atari PongNoFrameskip-v4

For more information, visit the WandB log .

Atari BreakoutNoFrameskip-v4 (Will be ready soon)

PyBullet Reacher-v2

For more information, visit the WandB log .

PyBullet HalfCheetah-v2 (Will be ready soon)

Future Plans

Below are some things I hope to incorporate to RLcycle:

TRPO and PPO (medium priority)
IQN (low priority)
Compatibility with my distributed RL framework distributedRL. (i.e. Ape-X for all off-policy algorithms). (high priority)

References

Repositories

Special thanks to @Medipixel! A lot of the design choices I've made here are based on what I learned through working on their open source RL framework during my internship there.
Higgsfield's RL-Adventure Series.

Papers

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 184

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗