All Projects → quanvuong → Handful Of Trials Pytorch

quanvuong / Handful Of Trials Pytorch

Unofficial Pytorch code for "Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Handful Of Trials Pytorch

Openaigym
Solving OpenAI Gym problems.
Stars: ✭ 98 (-12.5%)
Mutual labels:  reinforcement-learning
Reinforcement Learning Cheat Sheet
Reinforcement Learning Cheat Sheet
Stars: ✭ 104 (-7.14%)
Mutual labels:  reinforcement-learning
Cartpole
OpenAI's cartpole env solver.
Stars: ✭ 107 (-4.46%)
Mutual labels:  reinforcement-learning
Samsung Drl Code
Repository for codes of Deep Reinforcement Learning (DRL) lectured at Samsung
Stars: ✭ 99 (-11.61%)
Mutual labels:  reinforcement-learning
Torchrl
Highly Modular and Scalable Reinforcement Learning
Stars: ✭ 102 (-8.93%)
Mutual labels:  reinforcement-learning
Tensorflow2.0 Examples
🙄 Difficult algorithm, Simple code.
Stars: ✭ 1,397 (+1147.32%)
Mutual labels:  reinforcement-learning
Papers Literature Ml Dl Rl Ai
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
Stars: ✭ 1,341 (+1097.32%)
Mutual labels:  reinforcement-learning
Pairstrade Fyp 2019
We tested 3 approaches for Pair Trading: distance, cointegration and reinforcement learning approach.
Stars: ✭ 109 (-2.68%)
Mutual labels:  reinforcement-learning
Direct Future Prediction Keras
Direct Future Prediction (DFP ) in Keras
Stars: ✭ 103 (-8.04%)
Mutual labels:  reinforcement-learning
Lang Emerge Parlai
Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI
Stars: ✭ 106 (-5.36%)
Mutual labels:  reinforcement-learning
Gym Ignition
Framework for developing OpenAI Gym robotics environments simulated with Ignition Gazebo
Stars: ✭ 97 (-13.39%)
Mutual labels:  reinforcement-learning
Reinforcement learning
강화학습에 대한 기본적인 알고리즘 구현
Stars: ✭ 100 (-10.71%)
Mutual labels:  reinforcement-learning
Aws Robomaker Sample Application Deepracer
Use AWS RoboMaker and demonstrate running a simulation which trains a reinforcement learning (RL) model to drive a car around a track
Stars: ✭ 105 (-6.25%)
Mutual labels:  reinforcement-learning
Chemgan Challenge
Code for the paper: Benhenda, M. 2017. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227.
Stars: ✭ 98 (-12.5%)
Mutual labels:  reinforcement-learning
Mojitalk
Code for "MojiTalk: Generating Emotional Responses at Scale" https://arxiv.org/abs/1711.04090
Stars: ✭ 107 (-4.46%)
Mutual labels:  reinforcement-learning
Rlai Exercises
Exercise Solutions for Reinforcement Learning: An Introduction [2nd Edition]
Stars: ✭ 97 (-13.39%)
Mutual labels:  reinforcement-learning
Reinforcement Learning
🤖 Implements of Reinforcement Learning algorithms.
Stars: ✭ 104 (-7.14%)
Mutual labels:  reinforcement-learning
Navbot
Using RGB Image as Visual Input for Mapless Robot Navigation
Stars: ✭ 111 (-0.89%)
Mutual labels:  reinforcement-learning
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+9810.71%)
Mutual labels:  reinforcement-learning
Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+2582.14%)
Mutual labels:  reinforcement-learning

This repo contains a pytorch implementation of the wonderful model-based Reinforcement Learning algorithms proposed in Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models.

As of now, the repo only supports the most high-performing variant: probabilistic ensemble for the learned dynamics model, TSinf trajectory sampling and Cross Entropy method for action optimization.

The code is structured with the same levels of abstraction as the original TF implementation, with the exception that the TF dynamics model is replaced by a Pytorch dynamics model.

I'm happy to take pull request if you see ways to improve the repo :).

Performance

The y-axis indicates the maximum reward seen so far, as is done in the paper.

On the seed I have specified in the code, I could not get the same result as the paper on HalfCheetah. I had combed through the code but couldn’t find any potential bugs.

I suspect the lower performance is because HC has deceptive modes in the objective function surface, and therefore high variance in performance.

To get to 15k episode return, the HC needs to run on its legs. However, another mode is for the HC to flip on its back and wiggle its legs.

Even SAC is stuck in this mode for some initial seeds.

https://github.com/rail-berkeley/softlearning/issues/75

I didn’t have time to pursue this issue further. If you encounter this issue, try to render the behavior of the HC, I think that will be very helpful in figuring out the issue.

Requirements

  1. The requirements in the original TF implementation
  2. Pytorch 1.0.0

For specific requirements, please take a look at the pip dependency file requirements.txt and conda dependency file environments.yml.

Running Experiments

Experiments for a particular environment can be run using:

python mbexp.py
    -env    ENV       (required) The name of the environment. Select from
                                 [cartpole, reacher, pusher, halfcheetah].

Results will be saved in <logdir>/<date+time of experiment start>/. Trial data will be contained in logs.mat, with the following contents:

{
    "observations": NumPy array of shape
        [num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, obs_dim]
    "actions": NumPy array of shape
        [num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, ac_dim]
    "rewards": NumPy array of shape
        [num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, 1]
    "returns": Numpy array of shape [1, num_train_iters * neval]
}

To visualize the result, please take a look at plotter.ipynb

Acknowledgement

Huge thank to the authors of the paper for open-sourcing their code. Most of this repo is taken from the official TF implementation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].