All Projects → hijkzzz → pymarl2

hijkzzz / pymarl2

Licence: Apache-2.0 License
Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to pymarl2

SKU110K-DenseDet
A state of art detector for densely packed scenes dataset SKU-110K
Stars: ✭ 101 (-67.52%)
Mutual labels:  sota
ZZZKBot
ZZZKBot is a bot (AI) for Starcraft: Broodwar. It is designed to compete against other bots. It is not designed to compete against humans. It uses BWAPI as an API for interacting with Starcraft: Broodwar. I am not intending to support/maintain/develop ZZZKBot in future, although I haven't ruled it out either.
Stars: ✭ 57 (-81.67%)
Mutual labels:  starcraft
MIRNet-Keras
Keras Implementation of MIRNet - SoTA in Image Denoising, Super Resolution and Image Enhancement - CVPR 2020
Stars: ✭ 21 (-93.25%)
Mutual labels:  sota
CDS
[NeurIPS 2021] CDS achieves remarkable success in challenging benchmarks SMAC and GRF by balancing sharing and diversity.
Stars: ✭ 55 (-82.32%)
Mutual labels:  marl
best AI papers 2021
A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.
Stars: ✭ 2,740 (+781.03%)
Mutual labels:  sota
mpyq
Python library for reading MPQ archives.
Stars: ✭ 86 (-72.35%)
Mutual labels:  starcraft
MARL-resources-collection
A Collection of Multi-Agent Reinforcement Learning (MARL) Resources
Stars: ✭ 96 (-69.13%)
Mutual labels:  marl
CoachAI
BWAPI AI that helps you play/analyze StarCraft v1.16 game/replay with more eyes/ears/brains, get ready for a 3rd eye/ear and a 2nd brain operation !
Stars: ✭ 21 (-93.25%)
Mutual labels:  starcraft
ModelZoo.pytorch
Hands on Imagenet training. Unofficial ModelZoo project on Pytorch. MobileNetV3 Top1 75.64🌟 GhostNet1.3x 75.78🌟
Stars: ✭ 42 (-86.5%)
Mutual labels:  sota
Mava
A library of multi-agent reinforcement learning components and systems
Stars: ✭ 355 (+14.15%)
Mutual labels:  marl
Face-Renovation
Official repository of the paper "HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment".
Stars: ✭ 245 (-21.22%)
Mutual labels:  sota
image-classification
A collection of SOTA Image Classification Models in PyTorch
Stars: ✭ 70 (-77.49%)
Mutual labels:  sota
Transformer-in-Transformer
An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches
Stars: ✭ 40 (-87.14%)
Mutual labels:  sota
ResidualAttentionNetwork
A Gluon implement of Residual Attention Network. Best acc on cifar10-97.78%.
Stars: ✭ 104 (-66.56%)
Mutual labels:  sota
SimpleView
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"
Stars: ✭ 95 (-69.45%)
Mutual labels:  sota
screp
StarCraft - Brood War replay parser
Stars: ✭ 71 (-77.17%)
Mutual labels:  starcraft
Overseer
Tool for analyzing Starcraft 2 maps by region decomposition
Stars: ✭ 13 (-95.82%)
Mutual labels:  starcraft
bncsutil
The Classic Battle.net™ client library
Stars: ✭ 19 (-93.89%)
Mutual labels:  starcraft
SMAC
StarCraft II Multi Agent Challenge : QMIX, COMA, LIIR, QTRAN, Central V, ROMA, RODE, DOP, Graph MIX
Stars: ✭ 40 (-87.14%)
Mutual labels:  smac
sc2gears
The COMPLETE (!) source code of the Sc2gears universe (Sc2gears app + Sc2gears Database + web-based parsing engine - bundled in an Eclipse project).
Stars: ✭ 30 (-90.35%)
Mutual labels:  starcraft

PyMARL2

Open-source code for Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning.

This repository is fine-tuned for StarCraft Multi-agent Challenge (SMAC). For other multi-agent tasks, we also recommend an optimized implementation of QMIX: https://github.com/marlbenchmark/off-policy.

StarCraft 2 version: SC2.4.10. difficulty: 7.

2021.10.28 update: add Google Football Environments [vdn_gfootball.yaml] (use `simple115 features`).

2021.10.4 update: add QMIX with attention (qmix_att.yaml) as a baseline for Communication tasks.

Finetuned-QMIX

There are so many code-level tricks in the Multi-agent Reinforcement Learning (MARL), such as:

  • Value function clipping (clip max Q values for QMIX)
  • Value Normalization
  • Reward scaling
  • Orthogonal initialization and layer scaling
  • Adam
  • Neural networks hidden size
  • learning rate annealing
  • Reward Clipping
  • Observation Normalization
  • Gradient Clipping
  • Large Batch Size
  • N-step Returns(including GAE($\lambda$) and Q($\lambda$) ...)
  • Rollout Process Number
  • $\epsilon$-greedy annealing steps
  • Death Agent Masking

Related Works

  • Implementation Matters in Deep RL: A Case Study on PPO and TRPO
  • What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
  • The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Using a few of tricks above (bold texts), we enabled QMIX (qmix.yaml) to solve almost all hard scenarios of SMAC (Fine-tuned hyperparameters for each scenarios).

Senarios Difficulty QMIX (batch_size=128) Finetuned-QMIX
8m Easy - 100%
2c_vs_1sc Easy - 100%
2s3z Easy - 100%
1c3s5z Easy - 100%
3s5z Easy - 100%
8m_vs_9m Hard 84% 100%
5m_vs_6m Hard 84% 90%
3s_vs_5z Hard 96% 100%
bane_vs_bane Hard 100% 100%
2c_vs_64zg Hard 100% 100%
corridor Super Hard 0% 100%
MMM2 Super Hard 98% 100%
3s5z_vs_3s6z Super Hard 3% 93%(hidden_size = 256, qmix_large.yaml)
27m_vs_30m Super Hard 56% 100%
6h_vs_8z Super Hard 0% 93%($\lambda$ = 0.3)

Re-Evaluation

Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a general set of hyperparameters), and find that QMIX achieves the SOTA.

Scenarios Difficulty Value-based Policy-based
QMIX VDNs Qatten QPLEX WQMIX LICA VMIX DOP RIIT
2c_vs_64zg Hard 100% 100% 100% 100% 100% 100% 98% 84% 100%
8m_vs_9m Hard 100% 100% 100% 95% 95% 48% 75% 96% 95%
3s_vs_5z Hard 100% 100% 100% 100% 100% 96% 96% 100% 96%
5m_vs_6m Hard 90% 90% 90% 90% 90% 53% 9% 63% 67%
3s5z_vs_3s6z S-Hard 75% 43% 62% 68% 56% 0% 56% 0% 75%
corridor S-Hard 100% 98% 100% 96% 96% 0% 0% 0% 100%
6h_vs_8z S-Hard 84% 87% 82% 78% 75% 4% 80% 0% 19%
MMM2 S-Hard 100% 96% 100% 100% 96% 0% 70% 3% 100%
27m_vs_30m S-Hard 100% 100% 100% 100% 100% 9% 93% 0% 93%
Discrete PP - 40 39 - 39 39 30 39 38 38
Avg. Score Hard+ 94.9% 91.2% 92.7% 92.5% 90.5% 29.2% 67.4% 44.1% 84.0%

Communication

We also tested our QMIX-with-attention (qmix_att.yaml, $\lambda=0.3$, attention_heads=4) on some maps (from NDQ) that require communication.

Senarios (200w steps) Difficulty Finetuned-QMIX (No Communication) QMIX-with-attention ( Communication)
1o_10b_vs_1r - 56% 87%
1o_2r_vs_4r - 50% 95%
bane_vs_hM - 0% 0%

Google Football

We also tested VDN (vdn_gfootball.yaml) on some maps (from Google Football). Specially, we use simple115 features to train the model (The Google Football original paper use complex CNN features). We did not test QMIX because this environment does not provide global status information.

Senarios Difficulty VDN ($\lambda=1.0$)
academy_counterattack_hard - 0.71 (Test Score)
academy_counterattack_easy - 0.87 (Test Score)

Usage

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
bash install_dependecies.sh

Set up StarCraft II (2.4.10) and SMAC:

bash install_sc2.sh

This will download SC2.4.10 into the 3rdparty folder and copy the maps necessary to run over.

Set up Google Football:

bash install_gfootball.sh

Command Line Tool

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor
# For Difficulty-Enhanced Predator-Prey
python3 src/main.py --config=qmix_predator_prey --env-config=stag_hunt with env_args.map_name=stag_hunt
# For Communication tasks
python3 src/main.py --config=qmix_att --env-config=sc2 with env_args.map_name=1o_10b_vs_1r
# For Google Football (Insufficient testing)
# map_name: academy_counterattack_easy, academy_counterattack_hard, five_vs_five...
python3 src/main.py --config=vdn_gfootball --env-config=gfootball with env_args.map_name=academy_counterattack_hard env_args.num_agents=4

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

Run n parallel experiments

# bash run.sh config_name env_config_name map_name_list (arg_list threads_num gpu_list experinments_num)
bash run.sh qmix sc2 6h_vs_8z epsilon_anneal_time=500000,td_lambda=0.3 2 0 5

xxx_list is separated by ,.

All results will be stored in the Results folder and named with map_name.

Kill all training processes

# all python and game processes of current user will quit.
bash clean.sh

Citation

@article{hu2021rethinking,
      title={Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning}, 
      author={Jian Hu and Siyang Jiang and Seth Austin Harding and Haibin Wu and Shih-wei Liao},
      year={2021},
      eprint={2102.03479},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].