All Projects → marlbenchmark → On Policy

marlbenchmark / On Policy

Licence: mit
This is the official implementation of Multi-Agent PPO.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to On Policy

Competitive Programming
Repository of all my submissions to some competitive programming website (Online Judges), as well as, the implementation of some data structures and algorithms.
Stars: ✭ 53 (-15.87%)
Mutual labels:  algorithms
Leetcode Python
Leetcode Python Solution and Explanation. Also a Guide to Prepare for Software Engineer Interview.
Stars: ✭ 1,088 (+1626.98%)
Mutual labels:  algorithms
Complete Placement Preparation
This repository consists of all the material required for cracking the coding rounds and technical interviews during placements.
Stars: ✭ 1,114 (+1668.25%)
Mutual labels:  algorithms
Lc Java
Clean Leetcode solutions in Java
Stars: ✭ 54 (-14.29%)
Mutual labels:  algorithms
Javascript
Implementation of All ▲lgorithms in Javascript Programming Language
Stars: ✭ 56 (-11.11%)
Mutual labels:  algorithms
Awesome Java Leetcode
👑 LeetCode of algorithms with java solution(updating).
Stars: ✭ 8,297 (+13069.84%)
Mutual labels:  algorithms
Leetcode
🕵️‍♂️ leetcode practice
Stars: ✭ 52 (-17.46%)
Mutual labels:  algorithms
Leetcode
👏🏻 leetcode solutions for Humans™
Stars: ✭ 1,129 (+1692.06%)
Mutual labels:  algorithms
Algo
Algorithms in Go
Stars: ✭ 56 (-11.11%)
Mutual labels:  algorithms
Fromscratch
Stars: ✭ 61 (-3.17%)
Mutual labels:  algorithms
Dart Algorithms
Data structures and algorithms with Dart. Dart版本的数据结构与算法.
Stars: ✭ 53 (-15.87%)
Mutual labels:  algorithms
Reinforcement Learning
Implementation of Reinforcement Learning algorithms in Python, based on Sutton's & Barto's Book (Ed. 2)
Stars: ✭ 55 (-12.7%)
Mutual labels:  algorithms
Data Structures C
A collection of algorithms for data structure manipulation in C
Stars: ✭ 59 (-6.35%)
Mutual labels:  algorithms
Algorithm Guide
BITLIU`s Tutorials of Algorithm and Data Structure🚀🚀🚀
Stars: ✭ 1,068 (+1595.24%)
Mutual labels:  algorithms
Coding Interview
😀 代码面试题集,包括剑指 Offer、编程之美等
Stars: ✭ 1,111 (+1663.49%)
Mutual labels:  algorithms
Java
Repository for Java codes and algos.Star the repo too.
Stars: ✭ 53 (-15.87%)
Mutual labels:  algorithms
Learning2run
Our NIPS 2017: Learning to Run source code
Stars: ✭ 57 (-9.52%)
Mutual labels:  ppo
Leetcode
This repository contains the solutions and explanations to the algorithm problems on LeetCode. Only medium or above are included. All are written in C++/Python and implemented by myself. The problems attempted multiple times are labelled with hyperlinks.
Stars: ✭ 1,130 (+1693.65%)
Mutual labels:  algorithms
Datastructures
🚀 Implementation of core data structures for R
Stars: ✭ 64 (+1.59%)
Mutual labels:  algorithms
Mario rl
Stars: ✭ 60 (-4.76%)
Mutual labels:  ppo

MAPPO

Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu.

Website: https://sites.google.com/view/mappo

This repository implements MAPPO, an multi-agent variant of PPO. The implementation in this repositorory is used in the paper "The Surprising Effectiveness of MAPPO in Cooperative Multi-Agent Games" (https://arxiv.org/abs/2103.01955). This repository is heavily based on https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail.

Environments supported:

1. Usage

All core code is located within the onpolicy folder. The algorithms/ subfolder contains algorithm-specific code for MAPPO.

  • The envs/ subfolder contains environment wrapper implementations for the MPEs, SMAC, and Hanabi.

  • Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for each environment.

  • Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named in the following manner: train_algo_environment.sh. Within each file, the map name (in the case of SMAC and the MPEs) can be altered.

  • Python training scripts for each environment can be found in the scripts/train/ folder.

  • The config.py file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones used in the paper; however, please refer to the appendix for a full list of hyperparameters used.

2. Installation

Here we give an example installation on CUDA == 10.1. For non-GPU & other CUDA version installation, please refer to the PyTorch website.

# create conda environment
conda create -n marl python==3.6.1
conda activate marl
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
# install on-policy package
cd on-policy
pip install -e .

Even though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet.

2.1 Install StarCraftII 4.10

unzip SC2.4.10.zip
# password is iagreetotheeula
echo "export SC2PATH=~/StarCraftII/" > ~/.bashrc

2.2 Hanabi

Environment code for Hanabi is developed from the open-source environment code, but has been slightly modified to fit the algorithms used here.
To install, execute the following:

pip install cffi
cd envs/hanabi
mkdir build & cd build
cmake ..
make -j

2.3 Install MPE

# install this package first
pip install seaborn

There are 3 Cooperative scenarios in MPE:

  • simple_spread
  • simple_speaker_listener, which is 'Comm' scenario in paper
  • simple_reference

3.Train

Here we use train_mpe.sh as an example:

cd onpolicy/scripts
chmod +x ./train_mpe.sh
./train_mpe.sh

Local results are stored in subfold scripts/results. Note that we use Weights & Bias as the default visualization platform; to use Weights & Bias, please register and login to the platform first. More instructions for using Weights&Bias can be found in the official documentation. Adding the --use_wandb in command line or in the .sh file will use Tensorboard instead of Weights & Biases.

We additionally provide ./eval_hanabi_forward.sh for evaluating the hanabi score over 100k trials.

4. Publication

If you find this repository useful, please cite our paper:

@misc{yu2021surprising,
      title={The Surprising Effectiveness of MAPPO in Cooperative Multi-Agent Games}, 
      author={Chao Yu and Akash Velu and Eugene Vinitsky and Yu Wang and Alexandre Bayen and Yi Wu},
      year={2021},
      eprint={2103.01955},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].