Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → llSourcell → Openai_five_vs_dota2_explained

llSourcell / Openai_five_vs_dota2_explained

Licence: mit

This is the code for "OpenAI Five vs DOTA 2 Explained" By Siraj Raval on Youtube

Programming Languages

python

139335 projects - #7 most used programming language

Overview

This is the code for this video on Youtube by Siraj Raval on OpenAI Five vs DOTA 2. The author of this code is alexis-jacq. The real code is not yet publically available, but this is a basic version of the algorithm.

Dependencies

PyTorch
OpenAI Gym

Usage

Run 'python main.py (gym_environment_name)' in terminal

Pytorch-DPPO

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https://arxiv.org/pdf/1707.06347.pdf).

I finally fixed what was wrong with the gradient descent step, using previous log-prob from rollout batches. At least ppo.py is fixed, the rest is going to be corrected as well very soon.

In the following example I was not patient enough to wait for million iterations, I just wanted to check if the model is properly learning:

Progress of single PPO:

InvertedPendulum

InvertedDoublePendulum

HalfCheetah

hopper (PyBullet)

halfcheetah (PyBullet)

Progress of DPPO (4 agents) [TODO]

Acknowledgments

The structure of this code is based on https://github.com/ikostrikov/pytorch-a3c.

Hyperparameters and loss computation has been taken from https://github.com/openai/baselines

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 119

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗