All Projects → llSourcell → Ai_dresses_itself

llSourcell / Ai_dresses_itself

Licence: mit
This is the code for "AI that Dresses Itself" By Siraj Raval on Youtube

Programming Languages

python
139335 projects - #7 most used programming language

Overview

This is the code for this video on Youtube by Siraj Raval. This is an implementation of the Trust Region Policy Optimization algorithm that was used by the researchers in the video. They did not, however, make their full code public. So here is the technique applied to game environments. Someone can use it as a starting point to recreate their code. Meanwhile -- hey researchers :) go ahead and release it the community would appreciate it.

PyTorch implementation of TRPO

Try this implementation of PPO (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.

This is a PyTorch implementation of "Trust Region Policy Optimization (TRPO)".

This is code mostly ported from original implementation by John Schulman. In contrast to another implementation of TRPO in PyTorch, this implementation uses exact Hessian-vector product instead of finite differences approximation.

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Usage

python main.py --env-name "Reacher-v1"

Recommended hyper parameters

InvertedPendulum-v1: 5000

Reacher-v1, InvertedDoublePendulum-v1: 15000

HalfCheetah-v1, Hopper-v1, Swimmer-v1, Walker2d-v1: 25000

Ant-v1, Humanoid-v1: 50000

Credits

Credits for this code go to ikostrikov. I've merely created a wrapper to get people started.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].