All Projects → uvipen → Super Mario Bros Ppo Pytorch

uvipen / Super Mario Bros Ppo Pytorch

Licence: mit
Proximal Policy Optimization (PPO) algorithm for Super Mario Bros

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Super Mario Bros Ppo Pytorch

Super Mario Bros A3c Pytorch
Asynchronous Advantage Actor-Critic (A3C) algorithm for Super Mario Bros
Stars: ✭ 775 (+19.41%)
Mutual labels:  gym, ai, reinforcement-learning
Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (-1.39%)
Mutual labels:  reinforcement-learning, openai-gym, ppo
Rl Baselines Zoo
A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
Stars: ✭ 839 (+29.28%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Basic reinforcement learning
An introductory series to Reinforcement Learning (RL) with comprehensive step-by-step tutorials.
Stars: ✭ 826 (+27.27%)
Mutual labels:  ai, reinforcement-learning, openai-gym
Atari
AI research environment for the Atari 2600 games 🤖.
Stars: ✭ 174 (-73.19%)
Mutual labels:  gym, ai, reinforcement-learning
Cartpole
OpenAI's cartpole env solver.
Stars: ✭ 107 (-83.51%)
Mutual labels:  ai, reinforcement-learning, openai-gym
Pytorch Rl
This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch
Stars: ✭ 394 (-39.29%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Rlcard
Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
Stars: ✭ 980 (+51%)
Mutual labels:  ai, reinforcement-learning, openai-gym
Stable Baselines
Mirror of Stable-Baselines: a fork of OpenAI Baselines, implementations of reinforcement learning algorithms
Stars: ✭ 115 (-82.28%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Torchrl
Pytorch Implementation of Reinforcement Learning Algorithms ( Soft Actor Critic(SAC)/ DDPG / TD3 /DQN / A2C/ PPO / TRPO)
Stars: ✭ 90 (-86.13%)
Mutual labels:  gym, reinforcement-learning, ppo
Deterministic Gail Pytorch
PyTorch implementation of Deterministic Generative Adversarial Imitation Learning (GAIL) for Off Policy learning
Stars: ✭ 44 (-93.22%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Rl Book
Source codes for the book "Reinforcement Learning: Theory and Python Implementation"
Stars: ✭ 464 (-28.51%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Dmc2gym
OpenAI Gym wrapper for the DeepMind Control Suite
Stars: ✭ 75 (-88.44%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Ma Gym
A collection of multi agent environments based on OpenAI gym.
Stars: ✭ 226 (-65.18%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Deep Reinforcement Learning
Repo for the Deep Reinforcement Learning Nanodegree program
Stars: ✭ 4,012 (+518.18%)
Mutual labels:  reinforcement-learning, openai-gym, ppo
Spot mini mini
Dynamics and Domain Randomized Gait Modulation with Bezier Curves for Sim-to-Real Legged Locomotion.
Stars: ✭ 426 (-34.36%)
Mutual labels:  reinforcement-learning, openai-gym
Aigames
use AI to play some games.
Stars: ✭ 422 (-34.98%)
Mutual labels:  ai, reinforcement-learning
Autonomous Learning Library
A PyTorch library for building deep reinforcement learning agents.
Stars: ✭ 425 (-34.51%)
Mutual labels:  reinforcement-learning, ppo
Tensor House
A collection of reference machine learning and optimization models for enterprise operations: marketing, pricing, supply chain
Stars: ✭ 449 (-30.82%)
Mutual labels:  ai, reinforcement-learning
Deep Rl Keras
Keras Implementation of popular Deep RL Algorithms (A3C, DDQN, DDPG, Dueling DDQN)
Stars: ✭ 395 (-39.14%)
Mutual labels:  gym, reinforcement-learning

[PYTORCH] Proximal Policy Optimization (PPO) for playing Super Mario Bros

Introduction

Here is my python source code for training an agent to play super mario bros. By using Proximal Policy Optimization (PPO) algorithm introduced in the paper Proximal Policy Optimization Algorithms paper.

Talking about performance, my PPO-trained agent could complete 29/32 levels, which is much better than what I expected at the beginning.

For your information, PPO is the algorithm proposed by OpenAI and used for training OpenAI Five, which is the first AI to beat the world champions in an esports game. Specifically, The OpenAI Five dispatched a team of casters and ex-pros with MMR rankings in the 99.95th percentile of Dota 2 players in August 2018.









Sample results

Motivation

It has been a while since I have released my A3C implementation (A3C code) for training an agent to play super mario bros. Although the trained agent could complete levels quite fast and quite well (at least faster and better than I played 😅), it still did not totally satisfy me. The main reason is, agent trained with A3C could only complete 19/32 levels, no matter how much I fine-tuned and tested. It motivated me to look for a new approach.

Before I decided to choose PPO as my next complete implementation, I had partially implemented a couple of other algorithms, including A2C and Rainbow. While the former did not show a big jump in performance, the latter is more suitable for more randomized environments/games, like ping-pong or space invaders.

How to use my code

With my code, you can:

  • Train your model by running python train.py. For example: python train.py --world 5 --stage 2 --lr 1e-4
  • Test your trained model by running python test.py. For example: python test.py --world 5 --stage 2

Note: If you got stuck at any level, try training again with different learning rates. You could conquer 29/32 levels like what I did, by changing only learning rate. Normally I set learning rate as 1e-3, 1e-4 or 1e-5. However, there are some difficult levels, including level 1-3, in which I finally trained successfully with learning rate of 7e-5 after failed for 70 times.

Docker

For being convenient, I provide Dockerfile which could be used for running training as well as test phases

Assume that docker image's name is ppo. You only want to use the first gpu. You already clone this repository and cd into it.

Build:

sudo docker build --network=host -t ppo .

Run:

docker run --runtime=nvidia -it --rm --volume="$PWD"/../Super-mario-bros-PPO-pytorch:/Super-mario-bros-PPO-pytorch --gpus device=0 ppo

Then inside docker container, you could simply run train.py or test.py scripts as mentioned above.

Note: There is a bug for rendering when using docker. Therefore, when you train or test by using docker, please comment line env.render() on script src/process.py for training or test.py for test. Then, you will not be able to see the window pop up for visualization anymore. But it is not a big problem, since the training process will still run, and the test process will end up with an output mp4 file for visualization

Why there are still 3 levels missing?

In world 4-4, 7-4 and 8-4, map consists of puzzles where the player must choose the correct the path in order to move forward. If you choose a wrong path, you have to go through path you visited again. That's why my agent at the moment can not complete these 3 levels

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].