All Projects → qlan3 → Explorer

qlan3 / Explorer

Licence: MIT license
Explorer is a PyTorch reinforcement learning framework for exploring new ideas.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Explorer

Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+311.11%)
Mutual labels:  deep-reinforcement-learning, q-learning, dqn, policy-gradient, actor-critic, ppo
Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+5462.96%)
Mutual labels:  deep-reinforcement-learning, q-learning, dqn, policy-gradient, ppo
Reinforcement Learning With Tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
Stars: ✭ 6,948 (+12766.67%)
Mutual labels:  q-learning, dqn, policy-gradient, actor-critic, ppo
Deep Reinforcement Learning With Pytorch
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Stars: ✭ 1,345 (+2390.74%)
Mutual labels:  deep-reinforcement-learning, dqn, policy-gradient, actor-critic, ppo
rl implementations
No description or website provided.
Stars: ✭ 40 (-25.93%)
Mutual labels:  deep-reinforcement-learning, dqn, policy-gradient, actor-critic
Machine Learning Is All You Need
🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!
Stars: ✭ 173 (+220.37%)
Mutual labels:  deep-reinforcement-learning, dqn, actor-critic, ppo
Paddle-RLBooks
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Stars: ✭ 113 (+109.26%)
Mutual labels:  q-learning, dqn, policy-gradient, actor-critic
Pytorch Rl
This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch
Stars: ✭ 394 (+629.63%)
Mutual labels:  deep-reinforcement-learning, dqn, gym, policy-gradient
Reinforcement Learning Algorithms
This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)
Stars: ✭ 426 (+688.89%)
Mutual labels:  deep-reinforcement-learning, dqn, actor-critic, ppo
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (+331.48%)
Mutual labels:  deep-reinforcement-learning, dqn, actor-critic, ppo
Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (+1085.19%)
Mutual labels:  deep-reinforcement-learning, q-learning, policy-gradient, ppo
Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+5201.85%)
Mutual labels:  deep-reinforcement-learning, dqn, policy-gradient, actor-critic
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (+718.52%)
Mutual labels:  deep-reinforcement-learning, q-learning, policy-gradient, actor-critic
Slm Lab
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Stars: ✭ 904 (+1574.07%)
Mutual labels:  deep-reinforcement-learning, dqn, policy-gradient, ppo
Deeprl algorithms
DeepRL algorithms implementation easy for understanding and reading with Pytorch and Tensorflow 2(DQN, REINFORCE, VPG, A2C, TRPO, PPO, DDPG, TD3, SAC)
Stars: ✭ 97 (+79.63%)
Mutual labels:  deep-reinforcement-learning, dqn, policy-gradient, ppo
Deep Reinforcement Learning Algorithms
31 projects in the framework of Deep Reinforcement Learning algorithms: Q-learning, DQN, PPO, DDPG, TD3, SAC, A2C and others. Each project is provided with a detailed training log.
Stars: ✭ 167 (+209.26%)
Mutual labels:  deep-reinforcement-learning, dqn, ppo
Hands On Intelligent Agents With Openai Gym
Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
Stars: ✭ 189 (+250%)
Mutual labels:  deep-reinforcement-learning, dqn, actor-critic
Minimalrl
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
Stars: ✭ 2,051 (+3698.15%)
Mutual labels:  deep-reinforcement-learning, dqn, ppo
Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (+222.22%)
Mutual labels:  deep-reinforcement-learning, gym, actor-critic
Rainy
☔ Deep RL agents with PyTorch☔
Stars: ✭ 39 (-27.78%)
Mutual labels:  deep-reinforcement-learning, dqn, ppo

Explorer

Explorer is a PyTorch reinforcement learning framework for exploring new ideas.

Implemented algorithms

To do list

  • SAC with automatically adjusted temperature
  • SAC with discrete action spaces

The dependency tree of agent classes

Base Agent
  ├── Vanilla DQN
  |     ├── DQN
  |     |    ├── DDQN
  |     |    ├── NoisyNetDQN
  |     |    ├── BootstrappedDQN
  |     |    └── MeDQN_Uniform, MeDQN_Real
  |     ├── Maxmin DQN ── Ensemble DQN
  |     └── Averaged DQN
  └── REINFORCE 
        ├── Actor-Critic
        |     └── PPO ── RPG
        └── SAC ── DDPG ── TD3

Requirements

  • Python (>=3.6)
  • PyTorch
  • Gym && Gym Games: You may only install part of Gym (classic_control, box2d) by command pip install 'gym[classic_control, box2d]'.
  • Optional:
    • Gym Atari: pip install gym[atari,accept-rom-license]
    • Gym Mujoco:
      • Download MuJoCo version 1.50 from MuJoCo website.
      • Unzip the downloaded mjpro150 directory into ~/.mujoco/mjpro150, and place the activation key (the mjkey.txt file downloaded from here) at ~/.mujoco/mjkey.txt.
      • Install mujoco-py: pip install 'mujoco-py<1.50.2,>=1.50.1'
      • Install gym[mujoco]: pip install gym[mujoco]
    • PyBullet: pip install pybullet
    • DeepMind Control Suite: pip install git+git://github.com/denisyarats/dmc2gym.git
  • Others: Please check requirements.txt.

Experiments

Train && Test

All hyperparameters including parameters for grid search are stored in a configuration file in directory configs. To run an experiment, a configuration index is first used to generate a configuration dict corresponding to this specific configuration index. Then we run an experiment defined by this configuration dict. All results including log files are saved in directory logs. Please refer to the code for details.

For example, run the experiment with configuration file RPG.json and configuration index 1:

python main.py --config_file ./configs/RPG.json --config_idx 1

The models are tested for one episode after every test_per_episodes training episodes which can be set in the configuration file.

Grid Search (Optional)

First, we calculate the number of total combinations in a configuration file (e.g. RPG.json):

python utils/sweeper.py

The output will be:

Number of total combinations in RPG.json: 12

Then we run through all configuration indexes from 1 to 12. The simplest way is using a bash script:

for index in {1..12}
do
  python main.py --config_file ./configs/RPG.json --config_idx $index
done

Parallel is usually a better choice to schedule a large number of jobs:

parallel --eta --ungroup python main.py --config_file ./configs/RPG.json --config_idx {1} ::: $(seq 1 12)

Any configuration index that has the same remainder (divided by the number of total combinations) should have the same configuration dict. So for multiple runs, we just need to add the number of total combinations to the configuration index. For example, 5 runs for configuration index 1:

for index in 1 13 25 37 49
do
  python main.py --config_file ./configs/RPG.json --config_idx $index
done

Or a simpler way:

parallel --eta --ungroup python main.py --config_file ./configs/RPG.json --config_idx {1} ::: $(seq 1 12 60)

Analysis (Optional)

To analyze the experimental results, just run:

python analysis.py

Inside analysis.py, unfinished_index will print out the configuration indexes of unfinished jobs based on the existence of the result file. memory_info will print out the memory usage information and generate a histogram to show the distribution of memory usages in directory logs/RPG/0. Similarly, time_info will print out the time information and generate a histogram to show the distribution of time in directory logs/RPG/0. Finally, analyze will generate csv files that store training and test results. Please check analysis.py for more details. More functions are available in utils/plotter.py.

Enjoy!

Code of My Papers

  • Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning. ICLR, 2020. (Poster) [paper] [code]

  • Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi, A. Rupam Mahmood. Model-free Policy Learning with Reward Gradients. AISTATS, 2022. (Poster) [paper] [code]

  • Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood. Memory-efficient Reinforcement Learning with Knowledge Consolidation. Arxiv [paper] [code]

Cite

If you find this repo useful to your research, please cite my paper if related. Otherwise, please cite this repo:

@misc{Explorer,
  author = {Lan, Qingfeng},
  title = {A PyTorch Reinforcement Learning Framework for Exploring New Ideas},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/qlan3/Explorer}}
}

Acknowledgements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].