Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

Stars: ✭ 442 (-42.97%)

Mutual labels: reinforcement-learning, a3c

Pytorch Rl

This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch

Stars: ✭ 394 (-49.16%)

Mutual labels: gym, reinforcement-learning

Rl a3c pytorch

A3C LSTM Atari with Pytorch plus A3G design

Stars: ✭ 482 (-37.81%)

Mutual labels: reinforcement-learning, a3c

Robotics Rl Srl

S-RL Toolbox: Reinforcement Learning (RL) and State Representation Learning (SRL) for Robotics

Stars: ✭ 453 (-41.55%)

Mutual labels: gym, reinforcement-learning

Reinforcement Learning With Tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Stars: ✭ 6,948 (+796.52%)

Mutual labels: reinforcement-learning, a3c

Async deep reinforce

Asynchronous Methods for Deep Reinforcement Learning

Stars: ✭ 565 (-27.1%)

Mutual labels: reinforcement-learning, a3c

Pytorch Rl

Deep Reinforcement Learning with pytorch & visdom

Stars: ✭ 745 (-3.87%)

Mutual labels: reinforcement-learning, a3c

Aigames

use AI to play some games.

Stars: ✭ 422 (-45.55%)

Mutual labels: ai, reinforcement-learning

Tensor House

A collection of reference machine learning and optimization models for enterprise operations: marketing, pricing, supply chain

Stars: ✭ 449 (-42.06%)

Mutual labels: ai, reinforcement-learning

Animalai Olympics

Code repository for the Animal AI Olympics competition

Stars: ✭ 544 (-29.81%)

Mutual labels: ai, reinforcement-learning

Habitat Lab

A modular high-level library to train embodied AI agents across a variety of tasks, environments, and simulators.

Stars: ✭ 587 (-24.26%)

Mutual labels: ai, reinforcement-learning

Rl Book

Source codes for the book "Reinforcement Learning: Theory and Python Implementation"

Stars: ✭ 464 (-40.13%)

Mutual labels: gym, reinforcement-learning

Text summurization abstractive methods

Multiple implementations for abstractive text summurization , using google colab

Stars: ✭ 359 (-53.68%)

Mutual labels: ai, reinforcement-learning

Pytorch A3c

Simple A3C implementation with pytorch + multiprocessing

Stars: ✭ 364 (-53.03%)

Mutual labels: gym, a3c

Holodeck

High Fidelity Simulator for Reinforcement Learning and Robotics Research.

Stars: ✭ 513 (-33.81%)

Mutual labels: ai, reinforcement-learning

View All Similar Projects ➔

[PYTORCH] Asynchronous Advantage Actor-Critic (A3C) for playing Super Mario Bros

Introduction

Here is my python source code for training an agent to play super mario bros. By using Asynchronous Advantage Actor-Critic (A3C) algorithm introduced in the paper Asynchronous Methods for Deep Reinforcement Learning paper.

Sample results

Motivation

Before I implemented this project, there are several repositories reproducing the paper's result quite well, in different common deep learning frameworks such as Tensorflow, Keras and Pytorch. In my opinion, most of them are great. However, they seem to be overly complicated in many parts including image's pre-processing, environtment setup and weight initialization, which distracts user's attention from more important matters. Therefore, I decide to write a cleaner code, which simplifies unimportant parts, while still follows the paper strictly. As you could see, with minimal setup and simple network's initialization, as long as you implement the algorithm correctly, an agent will teach itself how to interact with environment and gradually find out the way to reach the final goal.

Explanation in layman's term

If you are already familiar to reinforcement learning in general and A3C in particular, you could skip this part. I write this part for explaining what is A3C algorithm, how and why it works, to people who are interested in or curious about A3C or my implementation, but do not understand the mechanism behind. Therefore, you do not need any prerequiste knowledge for reading this part ☺️

If you search on the internet, there are numerous article introducing or explaining A3C, some even provide sample code. However, I would like to take another approach: Break down the name Asynchronous Actor-Critic Agents into smaller parts and explain in an aggregated manner.

Actor-Critic

Your agent has 2 parts called actor and critic, and its goal is to make both parts perfom better over time by exploring and exploiting the environment. Let imagine a small mischievous child (actor) is discovering the amazing world around him, while his dad (critic) oversees him, to make sure that he does not do anything dangerous. Whenever the kid does anything good, his dad will praise and encourage him to repeat that action in the future. And of course, when the kid does anything harmful, he will get warning from his dad. The more the kid interacts to the world, and takes different actions, the more feedback, both positive and negative, he gets from his dad. The goal of the kid is, to collect as many positive feedback as possible from his dad, while the goal of the dad is to evaluate his son's action better. In other word, we have a win-win relationship between the kid and his dad, or equivalently between actor and critic.

Advantage Actor-Critic

To make the kid learn faster, and more stable, the dad, instead of telling his son how good his action is, will tell him how better or worse his action in compared to other actions (or a "virtual" average action). An example is worth a thousand words. Let's compare 2 pairs of dad and son. The first dad gives his son 10 candies for grade 10 and 1 candy for grade 1 in school. The second dad, on the other hand, gives his son 5 candies for grade 10, and "punishes" his son by not allowing him to watch his favorite TV series for a day when he gets grade 1. How do you think? The second dad seems to be a little bit smarter, right? Indeed, you could rarely prevent bad actions, if you still "encourage" them with small reward.

Asynchronous Advantage Actor-Critic

If an agent discovers environment alone, the learning process would be slow. More seriously, the agent could be possibly bias to a particular suboptimal solution, which is undesirable. What happen if you have a bunch of agents which simultaneously discover different part of the environment and update their new obtained knowledge to one another periodically? It is exactly the idea of Asynchronous Advantage Actor-Critic. Now the kid and his mates in kindergarten have a trip to a beautiful beach (with their teacher, of course). Their task is to build a great sand castle. Different child will build different parts of the castle, supervised by the teacher. Each of them will have different task, with the same final goal is a strong and eye-catching castle. Certainly, the role of the teacher now is the same as the dad in previous example. The only difference is that the former is busier 😅

How to use my code

With my code, you can:

Train your model by running python train.py
Test your trained model by running python test.py

Trained models

You could find some trained models I have trained in Super Mario Bros A3C trained models

Requirements

python 3.6
gym
cv2
pytorch
numpy

Acknowledgements

At the beginning, I could only train my agent to complete 9 stages. Then @davincibj pointed out that 19 stages could be completed and sent me the trained weights. Thank you a lot for the finding!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 775

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (10) 🔗