Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → quanvuong → Handful Of Trials Pytorch

quanvuong / Handful Of Trials Pytorch

Unofficial Pytorch code for "Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models"

Programming Languages

139335 projects - #7 most used programming language

Labels

reinforcement-learning ensemble-learning

Projects that are alternatives of or similar to Handful Of Trials Pytorch

Solving OpenAI Gym problems.

Stars: ✭ 98 (-12.5%)

Mutual labels: reinforcement-learning

Reinforcement Learning Cheat Sheet

Reinforcement Learning Cheat Sheet

Stars: ✭ 104 (-7.14%)

Mutual labels: reinforcement-learning

OpenAI's cartpole env solver.

Stars: ✭ 107 (-4.46%)

Mutual labels: reinforcement-learning

Samsung Drl Code

Repository for codes of Deep Reinforcement Learning (DRL) lectured at Samsung

Stars: ✭ 99 (-11.61%)

Mutual labels: reinforcement-learning

Highly Modular and Scalable Reinforcement Learning

Stars: ✭ 102 (-8.93%)

Mutual labels: reinforcement-learning

Tensorflow2.0 Examples

🙄 Difficult algorithm, Simple code.

Stars: ✭ 1,397 (+1147.32%)

Mutual labels: reinforcement-learning

Papers Literature Ml Dl Rl Ai

Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning

Stars: ✭ 1,341 (+1097.32%)

Mutual labels: reinforcement-learning

Pairstrade Fyp 2019

We tested 3 approaches for Pair Trading: distance, cointegration and reinforcement learning approach.

Stars: ✭ 109 (-2.68%)

Mutual labels: reinforcement-learning

Direct Future Prediction Keras

Direct Future Prediction (DFP ) in Keras

Stars: ✭ 103 (-8.04%)

Mutual labels: reinforcement-learning

Lang Emerge Parlai

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Stars: ✭ 106 (-5.36%)

Mutual labels: reinforcement-learning

Framework for developing OpenAI Gym robotics environments simulated with Ignition Gazebo

Stars: ✭ 97 (-13.39%)

Mutual labels: reinforcement-learning

Reinforcement learning

강화학습에 대한 기본적인 알고리즘 구현

Stars: ✭ 100 (-10.71%)

Mutual labels: reinforcement-learning

Aws Robomaker Sample Application Deepracer

Use AWS RoboMaker and demonstrate running a simulation which trains a reinforcement learning (RL) model to drive a car around a track

Stars: ✭ 105 (-6.25%)

Mutual labels: reinforcement-learning

Chemgan Challenge

Code for the paper: Benhenda, M. 2017. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227.

Stars: ✭ 98 (-12.5%)

Mutual labels: reinforcement-learning

Code for "MojiTalk: Generating Emotional Responses at Scale" https://arxiv.org/abs/1711.04090

Stars: ✭ 107 (-4.46%)

Mutual labels: reinforcement-learning

Exercise Solutions for Reinforcement Learning: An Introduction [2nd Edition]

Stars: ✭ 97 (-13.39%)

Mutual labels: reinforcement-learning

Reinforcement Learning

🤖 Implements of Reinforcement Learning algorithms.

Stars: ✭ 104 (-7.14%)

Mutual labels: reinforcement-learning

Using RGB Image as Visual Input for Mapless Robot Navigation

Stars: ✭ 111 (-0.89%)

Mutual labels: reinforcement-learning

Machine learning, in numpy

Stars: ✭ 11,100 (+9810.71%)

Mutual labels: reinforcement-learning

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+2582.14%)

Mutual labels: reinforcement-learning

View All Similar Projects ➔

This repo contains a pytorch implementation of the wonderful model-based Reinforcement Learning algorithms proposed in Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models.

As of now, the repo only supports the most high-performing variant: probabilistic ensemble for the learned dynamics model, TSinf trajectory sampling and Cross Entropy method for action optimization.

The code is structured with the same levels of abstraction as the original TF implementation, with the exception that the TF dynamics model is replaced by a Pytorch dynamics model.

I'm happy to take pull request if you see ways to improve the repo :).

Performance

The y-axis indicates the maximum reward seen so far, as is done in the paper.

On the seed I have specified in the code, I could not get the same result as the paper on HalfCheetah. I had combed through the code but couldn’t find any potential bugs.

I suspect the lower performance is because HC has deceptive modes in the objective function surface, and therefore high variance in performance.

To get to 15k episode return, the HC needs to run on its legs. However, another mode is for the HC to flip on its back and wiggle its legs.

Even SAC is stuck in this mode for some initial seeds.

https://github.com/rail-berkeley/softlearning/issues/75

I didn’t have time to pursue this issue further. If you encounter this issue, try to render the behavior of the HC, I think that will be very helpful in figuring out the issue.

Requirements

The requirements in the original TF implementation
Pytorch 1.0.0

For specific requirements, please take a look at the pip dependency file requirements.txt and conda dependency file environments.yml.

Running Experiments

Experiments for a particular environment can be run using:

python mbexp.py
    -env    ENV       (required) The name of the environment. Select from
                                 [cartpole, reacher, pusher, halfcheetah].

Results will be saved in <logdir>/<date+time of experiment start>/. Trial data will be contained in logs.mat, with the following contents:

{
    "observations": NumPy array of shape
        [num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, obs_dim]
    "actions": NumPy array of shape
        [num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, ac_dim]
    "rewards": NumPy array of shape
        [num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, 1]
    "returns": Numpy array of shape [1, num_train_iters * neval]
}

To visualize the result, please take a look at plotter.ipynb

Acknowledgement

Huge thank to the authors of the paper for open-sourcing their code. Most of this repo is taken from the official TF implementation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 112

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗