Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → marlbenchmark → On Policy

marlbenchmark / On Policy

Licence: mit

This is the official implementation of Multi-Agent PPO.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

algorithms ppo

Projects that are alternatives of or similar to On Policy

Competitive Programming

Repository of all my submissions to some competitive programming website (Online Judges), as well as, the implementation of some data structures and algorithms.

Stars: ✭ 53 (-15.87%)

Mutual labels: algorithms

Leetcode Python

Leetcode Python Solution and Explanation. Also a Guide to Prepare for Software Engineer Interview.

Stars: ✭ 1,088 (+1626.98%)

Mutual labels: algorithms

Complete Placement Preparation

This repository consists of all the material required for cracking the coding rounds and technical interviews during placements.

Stars: ✭ 1,114 (+1668.25%)

Mutual labels: algorithms

Lc Java

Clean Leetcode solutions in Java

Stars: ✭ 54 (-14.29%)

Mutual labels: algorithms

Javascript

Implementation of All ▲lgorithms in Javascript Programming Language

Stars: ✭ 56 (-11.11%)

Mutual labels: algorithms

Awesome Java Leetcode

👑 LeetCode of algorithms with java solution(updating).

Stars: ✭ 8,297 (+13069.84%)

Mutual labels: algorithms

Leetcode

🕵️‍♂️ leetcode practice

Stars: ✭ 52 (-17.46%)

Mutual labels: algorithms

Leetcode

👏🏻 leetcode solutions for Humans™

Stars: ✭ 1,129 (+1692.06%)

Mutual labels: algorithms

Algo

Algorithms in Go

Stars: ✭ 56 (-11.11%)

Mutual labels: algorithms

Fromscratch

Stars: ✭ 61 (-3.17%)

Mutual labels: algorithms

Dart Algorithms

Data structures and algorithms with Dart. Dart版本的数据结构与算法.

Stars: ✭ 53 (-15.87%)

Mutual labels: algorithms

Reinforcement Learning

Implementation of Reinforcement Learning algorithms in Python, based on Sutton's & Barto's Book (Ed. 2)

Stars: ✭ 55 (-12.7%)

Mutual labels: algorithms

Data Structures C

A collection of algorithms for data structure manipulation in C

Stars: ✭ 59 (-6.35%)

Mutual labels: algorithms

Algorithm Guide

BITLIU`s Tutorials of Algorithm and Data Structure🚀🚀🚀

Stars: ✭ 1,068 (+1595.24%)

Mutual labels: algorithms

Coding Interview

😀 代码面试题集，包括剑指 Offer、编程之美等

Stars: ✭ 1,111 (+1663.49%)

Mutual labels: algorithms

Java

Repository for Java codes and algos.Star the repo too.

Stars: ✭ 53 (-15.87%)

Mutual labels: algorithms

Learning2run

Our NIPS 2017: Learning to Run source code

Stars: ✭ 57 (-9.52%)

Mutual labels: ppo

Leetcode

This repository contains the solutions and explanations to the algorithm problems on LeetCode. Only medium or above are included. All are written in C++/Python and implemented by myself. The problems attempted multiple times are labelled with hyperlinks.

Stars: ✭ 1,130 (+1693.65%)

Mutual labels: algorithms

Datastructures

🚀 Implementation of core data structures for R

Stars: ✭ 64 (+1.59%)

Mutual labels: algorithms

Mario rl

Stars: ✭ 60 (-4.76%)

Mutual labels: ppo

View All Similar Projects ➔

MAPPO

Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu.

Website: https://sites.google.com/view/mappo

This repository implements MAPPO, an multi-agent variant of PPO. The implementation in this repositorory is used in the paper "The Surprising Effectiveness of MAPPO in Cooperative Multi-Agent Games" (https://arxiv.org/abs/2103.01955). This repository is heavily based on https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail.

Environments supported:

1. Usage

All core code is located within the onpolicy folder. The algorithms/ subfolder contains algorithm-specific code for MAPPO.

The envs/ subfolder contains environment wrapper implementations for the MPEs, SMAC, and Hanabi.
Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for each environment.
Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named in the following manner: train_algo_environment.sh. Within each file, the map name (in the case of SMAC and the MPEs) can be altered.
Python training scripts for each environment can be found in the scripts/train/ folder.
The config.py file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones used in the paper; however, please refer to the appendix for a full list of hyperparameters used.

2. Installation

Here we give an example installation on CUDA == 10.1. For non-GPU & other CUDA version installation, please refer to the PyTorch website.

# create conda environment
conda create -n marl python==3.6.1
conda activate marl
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install on-policy package
cd on-policy
pip install -e .

Even though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet.

2.1 Install StarCraftII 4.10

unzip SC2.4.10.zip
# password is iagreetotheeula
echo "export SC2PATH=~/StarCraftII/" > ~/.bashrc

download SMAC Maps, and move it to ~/StarCraftII/Maps/.
To use a stableid, copy stableid.json from https://github.com/Blizzard/s2client-proto.git to ~/StarCraftII/.

2.2 Hanabi

Environment code for Hanabi is developed from the open-source environment code, but has been slightly modified to fit the algorithms used here.
To install, execute the following:

pip install cffi
cd envs/hanabi
mkdir build & cd build
cmake ..
make -j

2.3 Install MPE

# install this package first
pip install seaborn

There are 3 Cooperative scenarios in MPE:

simple_spread
simple_speaker_listener, which is 'Comm' scenario in paper
simple_reference

3.Train

Here we use train_mpe.sh as an example:

cd onpolicy/scripts
chmod +x ./train_mpe.sh
./train_mpe.sh

Local results are stored in subfold scripts/results. Note that we use Weights & Bias as the default visualization platform; to use Weights & Bias, please register and login to the platform first. More instructions for using Weights&Bias can be found in the official documentation. Adding the --use_wandb in command line or in the .sh file will use Tensorboard instead of Weights & Biases.

We additionally provide ./eval_hanabi_forward.sh for evaluating the hanabi score over 100k trials.

4. Publication

If you find this repository useful, please cite our paper:

@misc{yu2021surprising,
      title={The Surprising Effectiveness of MAPPO in Cooperative Multi-Agent Games}, 
      author={Chao Yu and Akash Velu and Eugene Vinitsky and Yu Wang and Alexandre Bayen and Yi Wu},
      year={2021},
      eprint={2103.01955},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 63

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗