All Projects → uvipen → Street-fighter-A3C-ICM-pytorch

uvipen / Street-fighter-A3C-ICM-pytorch

Licence: MIT license
Curiosity-driven Exploration by Self-supervised Prediction for Street Fighter III Third Strike

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Street-fighter-A3C-ICM-pytorch

Easy Rl
强化学习中文教程,在线阅读地址:https://datawhalechina.github.io/easy-rl/
Stars: ✭ 3,004 (+1916.11%)
Mutual labels:  a3c
A3c continuous
A continuous action space version of A3C LSTM in pytorch plus A3G design
Stars: ✭ 223 (+49.66%)
Mutual labels:  a3c
reinforcement learning with Tensorflow
Minimal implementations of reinforcement learning algorithms by Tensorflow
Stars: ✭ 28 (-81.21%)
Mutual labels:  a3c
Reinforcementlearning Atarigame
Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games
Stars: ✭ 118 (-20.81%)
Mutual labels:  a3c
Tensorflow Rl
Implementations of deep RL papers and random experimentation
Stars: ✭ 176 (+18.12%)
Mutual labels:  a3c
Material-of-MCM-ICM
LaTeX template, Outstanding papers of last years and some useful books about MATLAB
Stars: ✭ 57 (-61.74%)
Mutual labels:  icm
Reinforcement learning
Reinforcement learning tutorials
Stars: ✭ 82 (-44.97%)
Mutual labels:  a3c
random-network-distillation-pytorch
Random Network Distillation pytorch
Stars: ✭ 185 (+24.16%)
Mutual labels:  curiosity-driven
Rlcycle
A library for ready-made reinforcement learning agents and reusable components for neat prototyping
Stars: ✭ 184 (+23.49%)
Mutual labels:  a3c
yarll
Combining deep learning and reinforcement learning.
Stars: ✭ 84 (-43.62%)
Mutual labels:  a3c
Baby A3c
A high-performance Atari A3C agent in 180 lines of PyTorch
Stars: ✭ 144 (-3.36%)
Mutual labels:  a3c
Minimalrl
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
Stars: ✭ 2,051 (+1276.51%)
Mutual labels:  a3c
ppo-pytorch
Proximal Policy Optimization(PPO) with Intrinsic Curiosity Module(ICM)
Stars: ✭ 83 (-44.3%)
Mutual labels:  icm
A3c Pytorch
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch
Stars: ✭ 108 (-27.52%)
Mutual labels:  a3c
Deep-Reinforcement-Learning-With-Python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
Stars: ✭ 222 (+48.99%)
Mutual labels:  a3c
Deep Reinforcement Learning With Pytorch
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Stars: ✭ 1,345 (+802.68%)
Mutual labels:  a3c
Reinforcement Learning
Minimal and Clean Reinforcement Learning Examples
Stars: ✭ 2,863 (+1821.48%)
Mutual labels:  a3c
deep rl acrobot
TensorFlow A2C to solve Acrobot, with synchronized parallel environments
Stars: ✭ 32 (-78.52%)
Mutual labels:  a3c
a3c-super-mario-pytorch
Reinforcement Learning for Super Mario Bros using A3C on GPU
Stars: ✭ 35 (-76.51%)
Mutual labels:  a3c
A3C-Crypto-Trader
Deep Reinforcement Learning A3C Crypto Trading Bot
Stars: ✭ 34 (-77.18%)
Mutual labels:  a3c

[PYTORCH] Curiosity-driven Exploration by Self-supervised Prediction for playing Street Fighter

Introduction

Here is my python source code for training an agent to play Street Fighter III Third Strike. By using Asynchronous Advantage Actor-Critic (A3C) algorithm introduced in the paper Asynchronous Methods for Deep Reinforcement Learning paper, combined with Intrinsic Curiosity module introduced in the paper Curiosity-driven Exploration by Self-supervised Prediction paper (Note: The latter also re-introduced the former)


Sample result

Motivation

Before I implemented this project, there are several repositories reproducing the paper's result quite well, in different common deep learning frameworks such as Tensorflow, Keras and Pytorch. In my opinion, most of them are great. However, they seem to be overly complicated in many parts including image's pre-processing, environtment setup and weight initialization, which distracts user's attention from more important matters. Therefore, I decide to write a cleaner code, which simplifies unimportant parts, while still follows the paper strictly. As you could see, with minimal setup and simple network's initialization, as long as you implement the algorithm correctly, an agent will teach itself how to interact with environment and gradually find out the way to reach the final goal.

Explanation of A3C in layman's term

If you are already familiar to reinforcement learning in general and A3C in particular, you could skip this part. I write this part for explaining what is A3C algorithm, how and why it works, to people who are interested in or curious about A3C or my implementation, but do not understand the mechanism behind. Therefore, you do not need any prerequiste knowledge for reading this part ☺️

If you search on the internet, there are numerous article introducing or explaining A3C, some even provide sample code. However, I would like to take another approach: Break down the name Asynchronous Actor-Critic Agents into smaller parts and explain in an aggregated manner.

Actor-Critic

Your agent has 2 parts called actor and critic, and its goal is to make both parts perfom better over time by exploring and exploiting the environment. Let imagine a small mischievous child (actor) is discovering the amazing world around him, while his dad (critic) oversees him, to make sure that he does not do anything dangerous. Whenever the kid does anything good, his dad will praise and encourage him to repeat that action in the future. And of course, when the kid does anything harmful, he will get warning from his dad. The more the kid interacts to the world, and takes different actions, the more feedback, both positive and negative, he gets from his dad. The goal of the kid is, to collect as many positive feedback as possible from his dad, while the goal of the dad is to evaluate his son's action better. In other word, we have a win-win relationship between the kid and his dad, or equivalently between actor and critic.

Advantage Actor-Critic

To make the kid learn faster, and more stable, the dad, instead of telling his son how good his action is, will tell him how better or worse his action in compared to other actions (or a "virtual" average action). An example is worth a thousand words. Let's compare 2 pairs of dad and son. The first dad gives his son 10 candies for grade 10 and 1 candy for grade 1 in school. The second dad, on the other hand, gives his son 5 candies for grade 10, and "punishes" his son by not allowing him to watch his favorite TV series for a day when he gets grade 1. How do you think? The second dad seems to be a little bit smarter, right? Indeed, you could rarely prevent bad actions, if you still "encourage" them with small reward.

Asynchronous Advantage Actor-Critic

If an agent discovers environment alone, the learning process would be slow. More seriously, the agent could be possibly bias to a particular suboptimal solution, which is undesirable. What happen if you have a bunch of agents which simultaneously discover different part of the environment and update their new obtained knowledge to one another periodically? It is exactly the idea of Asynchronous Advantage Actor-Critic. Now the kid and his mates in kindergarten have a trip to a beautiful beach (with their teacher, of course). Their task is to build a great sand castle. Different child will build different parts of the castle, supervised by the teacher. Each of them will have different task, with the same final goal is a strong and eye-catching castle. Certainly, the role of the teacher now is the same as the dad in previous example. The only difference is that the former is busier 😅

Explanation of ICM

The idea of Intrinsic Curiosity Module is to build a reward function that is intrinsic to the agent (generated by the agent itself). It means that the agent will be a self-learner since he will be the son but also the father.

How to use my code

With my code, you can:

  • Train your model by running python train.py
  • Test your trained model by running python test.py

Trained models

You could find some trained models I have trained in Street Fighter A3C-ICM trained models

Requirements

  • python 3.6
  • cv2
  • pytorch
  • numpy
  • MAMEToolkit

As stated in MAMEToolkit, I can not provide the game ROM. It is the users own legal responsibility to acquire the game ROM for emulation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].