ArvindSoma / a3c-super-mario-pytorch

Licence: MIT license

Reinforcement Learning for Super Mario Bros using A3C on GPU

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to a3c-super-mario-pytorch

Rl a3c pytorch

A3C LSTM Atari with Pytorch plus A3G design

Stars: ✭ 482 (+1277.14%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (+534.29%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c

Reinforcementlearning Atarigame

Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games

Stars: ✭ 118 (+237.14%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c

Btgym

Scalable, event-driven, deep-learning-friendly backtesting library

Stars: ✭ 765 (+2085.71%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c

deep rl acrobot

TensorFlow A2C to solve Acrobot, with synchronized parallel environments

Stars: ✭ 32 (-8.57%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c

yarll

Combining deep learning and reinforcement learning.

Stars: ✭ 84 (+140%)

Mutual labels: deep-reinforcement-learning, openai-gym, a3c

Deep Reinforcement Learning With Pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

Stars: ✭ 1,345 (+3742.86%)

Mutual labels: deep-reinforcement-learning, a3c

Easy Rl

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+8482.86%)

Mutual labels: deep-reinforcement-learning, a3c

Hierarchical Actor Critic Hac Pytorch

PyTorch implementation of Hierarchical Actor Critic (HAC) for OpenAI gym environments

Stars: ✭ 116 (+231.43%)

Mutual labels: deep-reinforcement-learning, openai-gym

Torch Ac

Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO

Stars: ✭ 70 (+100%)

Mutual labels: deep-reinforcement-learning, a3c

Finrl Library

FinRL: Financial Reinforcement Learning Framework. Please star. 🔥

Stars: ✭ 3,037 (+8577.14%)

Mutual labels: deep-reinforcement-learning, openai-gym

Baby A3c

A high-performance Atari A3C agent in 180 lines of PyTorch

Stars: ✭ 144 (+311.43%)

Mutual labels: deep-reinforcement-learning, a3c

Cs234 Reinforcement Learning Winter 2019

My Solutions of Assignments of CS234: Reinforcement Learning Winter 2019

Stars: ✭ 93 (+165.71%)

Mutual labels: deep-reinforcement-learning, openai-gym

Treeqn

Stars: ✭ 77 (+120%)

Mutual labels: deep-reinforcement-learning, openai-gym

A3c Pytorch

PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Stars: ✭ 108 (+208.57%)

Mutual labels: deep-reinforcement-learning, a3c

Noreward Rl

[ICML 2017] TensorFlow code for Curiosity-driven Exploration for Deep Reinforcement Learning

Stars: ✭ 1,176 (+3260%)

Mutual labels: deep-reinforcement-learning, openai-gym

Deep Reinforcement Learning Gym

Deep reinforcement learning model implementation in Tensorflow + OpenAI gym

Stars: ✭ 200 (+471.43%)

Mutual labels: deep-reinforcement-learning, openai-gym

Hands On Intelligent Agents With Openai Gym

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch

Stars: ✭ 189 (+440%)

Mutual labels: deep-reinforcement-learning, openai-gym

Reinforcement Learning

Minimal and Clean Reinforcement Learning Examples

Stars: ✭ 2,863 (+8080%)

Mutual labels: deep-reinforcement-learning, a3c

Deterministic Gail Pytorch

PyTorch implementation of Deterministic Generative Adversarial Imitation Learning (GAIL) for Off Policy learning

Stars: ✭ 44 (+25.71%)

Mutual labels: deep-reinforcement-learning, openai-gym

View All Similar Projects ➔

Reinforcement Learning for Super Mario Bros using A3C on GPU

This project is based on the paper Asynchronous Methods for Deep Reinforcement Learning, with custom training modifications. This project was created for the course Deep Learning for Computer Vision held at TUM.

Prerequisites

Python3.5+
PyTorch 0.3.0+
OpenAI Gym <=0.9.5

Getting Started

Install the following packages using the given commands

sudo apt-get update
sudo apt-get install -y python3-numpy python3-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python3-opengl libboost-all-dev libsdl2-dev swig
sudo apt-get install fceux

Now the Super Mario Bros NES environment has to be set up. We are using Philip Paquette's Super Mario Bros implementation for gym with some modifications to run on the current OpenAI Gym version. Follow Issue 6 to get the Mario NES environment up and running.

To match the default settings of this project modify the gym/envs/init.py to register env

register(
     id='metaSuperMarioBros-1-1-v0',
     entry_point='gym.envs.ppaquette_gym_super_mario:MetaSuperMarioBrosEnv',
)

No matter what 'id' is set to, use the MetaSuperMarioBrosEnv entry point to remove frequent closing of the emulator.

Training and Testing

To train the network from scratch, use the following command

python3 train-mario.py --num-processes 8

This command requires atleast an 8-Core system with 16GB memory and 6GB GPU memory. You can reduce the number of processes to run on a personal system, but expect the training time to increase drastically.

python3 train-mario.py --num-processes 2 --non-sample 1

This command requires atleast a 2-Core system with 4GB memory and 2GB GPU memory.

1 test process is created with remaining train processes. Test stores data in a CSV file inside save folder, which can be plotted later

The training process uses random and non-random processes so that it converges faster. By default there are two non-random processes, which can be changed using args. The random processes behaves exactly like the non-random processes when there is a clear difference in the output probabilities of the network. The non-random training processes exactly mimmic the test output, which helps train the network better.

Custom rewards are used to train the model more efficiently. They can be changed using the info dictionary or by modifying the wrappers file in common/atari_wrappers.py

More arguments are mentioned in the file train-mario.py.

Results

After ~20 hours of training on 8 processes (7 Train, 1 Test) the game converges.

Custom rewards used:

Time = -0.1
Distance = +1 or 0
Player Status = +/- 5
Score = 2.5 x [Increase in Score]
Done = +20 [Game Completed] or -20 [Game Incomplete]

The trained model is saved in save/trained-models/mario_a3c_params.pkl. Move it outside, to the save folder, to run the trained model.

Repository References

This project heavily relied on ikostrikov/pytorch-a3c.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ArvindSoma / a3c-super-mario-pytorch

Programming Languages

Labels

Projects that are alternatives of or similar to a3c-super-mario-pytorch

Reinforcement Learning for Super Mario Bros using A3C on GPU

Prerequisites

Getting Started

Training and Testing

Results

Repository References