All Projects → cshenton → atari-leaderboard

cshenton / atari-leaderboard

Licence: MIT license
A leaderboard of human and machine performance on the Arcade Learning Environment (ALE).

Projects that are alternatives of or similar to atari-leaderboard

Pytorch A2c Ppo Acktr Gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Stars: ✭ 2,632 (+11863.64%)
Mutual labels:  atari, ale
games services
A Flutter plugin to support game center and google play games services.
Stars: ✭ 67 (+204.55%)
Mutual labels:  leaderboard
dqn-pytorch
DQN to play Atari Pong
Stars: ✭ 77 (+250%)
Mutual labels:  atari
cx leaderboard
Elixir library for fast, customizable leaderboards
Stars: ✭ 18 (-18.18%)
Mutual labels:  leaderboard
Provenance
iOS & tvOS multi-emulator frontend, supporting various Atari, Bandai, NEC, Nintendo, Sega, SNK and Sony console systems… Get Started: https://wiki.provenance-emu.com |
Stars: ✭ 4,732 (+21409.09%)
Mutual labels:  atari
MSMARCO-MRC-Analysis
Analysis on the MS-MARCO leaderboard regarding the machine reading comprehension task.
Stars: ✭ 20 (-9.09%)
Mutual labels:  leaderboard
Deep-Q-Networks
Implementation of Deep/Double Deep/Dueling Deep Q networks for playing Atari games using Keras and OpenAI gym
Stars: ✭ 38 (+72.73%)
Mutual labels:  atari
RPC-Leaderboard
RPC Dataset Leaderboard
Stars: ✭ 63 (+186.36%)
Mutual labels:  leaderboard
laravel-gamify
Laravel Gamify: Gamification System with Points & Badges support
Stars: ✭ 35 (+59.09%)
Mutual labels:  leaderboard
InstanceRefer
[ICCV 2021] InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Stars: ✭ 64 (+190.91%)
Mutual labels:  leaderboard
rclc
Rich Context leaderboard competition, including the corpus and current SOTA for required tasks.
Stars: ✭ 20 (-9.09%)
Mutual labels:  leaderboard
Opensource-Contribution-Leaderboard
Open Source project contributors tracking leaderboard built with ❤️ in NodeJS 😉
Stars: ✭ 30 (+36.36%)
Mutual labels:  leaderboard
discord-paginationembed
A pagination utility for MessageEmbed in Discord.JS
Stars: ✭ 93 (+322.73%)
Mutual labels:  leaderboard
ubeswitch
PCB for multisync switch for Atari
Stars: ✭ 20 (-9.09%)
Mutual labels:  atari
gci-leaders
A website showing Google Code-in information 🏆
Stars: ✭ 40 (+81.82%)
Mutual labels:  leaderboard
Galaxy-Attack
An inspiration of the original Atari Space Invaders game built in pygame 👾 🎮
Stars: ✭ 32 (+45.45%)
Mutual labels:  atari
Scavenger
A virtual "scavenger hunt" game for mobile devices using Unity, Azure, and PlayFab
Stars: ✭ 19 (-13.64%)
Mutual labels:  leaderboard
Quizzie
Open Sourced Quiz Portal which can be used for any event / competition with a custom leaderboard.
Stars: ✭ 31 (+40.91%)
Mutual labels:  leaderboard
Highway
✈️~🎢A Java game to drive your truck against other 2 trucks throughout the highway and compete to stay on the top of the leaderboard
Stars: ✭ 13 (-40.91%)
Mutual labels:  leaderboard
TheGame
The platform that MetaGame will be played on aka MetaOS - an open source framework for running decentralized societies. Currently featuring MetaSys, MyMeta Profiles, Dashboard, MetaMenu & Quests
Stars: ✭ 100 (+354.55%)
Mutual labels:  leaderboard

Atari Reinforcement Learning Leaderboard

Any scores out of date? Make a Pull Request.

This is a leaderboard comparing world record human performance to start of the art machine performance in the Arcade Learning Environment (ALE).

Game Top Human Score Top Machine Score Best Best Machine Learning Type Notes
Alien 103583 9491 Human Rainbow Q-gradient
Amidar 71529 5131 Human Rainbow Q-gradient
Assault 8647 14497 Machine A3C Policy-gradient
Asterix 1000000 428200 Human Rainbow Q-gradient
Asteroids 57340 5093 Human A3C Policy-gradient *
Atlantis 10604840 2311815 Human PPO Policy-gradient
Bank Heist 45899 1611 Human Dueling DDQN Q-gradient
Battlezone 98000 62010 Human Rainbow Q-gradient
Beamrider 52866 26172 Human Prioritized DDQN Q-gradient 1B
Berzerk 1057940 2545 Human Rainbow Q-gradient
Bowling 279 135 Human HyperNEAT Genetic Policy J
Boxing 99 99 Draw Rainbow, ACER Q,Policy-gradient
Breakout 864 766 Human A3C Policy-gradient
Centipede 453916 25275 Human HyperNEAT Genetic Policy
Chopper Command 999999 16654 Human Rainbow Q-gradient
Crazy Climber 219900 183135 Human Prioritized DDQN Q-gradient
Defender 5443150 233021 Human A3C Policy-gradient N
Demon Attack 100100 115201 Machine A3C Policy-gradient +
Enduro 1666 2260 Machine Distribution DQN Q-gradient
Fishing Derby 51 46 Human Dueling DDQN Q-gradient
Freeway 38 34 Human Rainbow Q-gradient 1B
Frostbite 248460 9590 Human Rainbow Q-gradient
Gopher 30240 70354 Machine Rainbow Q-gradient
Gravitar 39100 1419 Human Rainbow Q-gradient
HERO 257310 55887 Human Rainbow Q-gradient J
Ice Hockey 25 10 Human HyperNEAT Genetic Policy
Kangaroo 1424600 14854 Human Dueling DDQN Q-gradient N
Krull 104100 12601 Human HyperNEAT Genetic Policy N
Kung Fu Master 79360 52181 Human Rainbow Q-gradient
Montezumas Revenge 400000 384 Human Rainbow Q-gradient
Ms Pacman 211480 6283 Human Dueling DDQN Q-gradient J
Name This Game 21210 13439 Human Prioritized DDQN Q-gradient
Phoenix 251180 108528 Human Rainbow Q-gradient
Pitfall 114000 0 Human Several Q-gradient
Pong 21 21 Draw Several Several E
Private Eye 101800 15172 Human Distribution DQN Q-gradient **
Qbert 2400000 33817 Human Rainbow Q-gradient N
Road Runner 210200 73949 Human A3C Policy-gradient
Robot Tank 68 65 Human Dueling DDQN Q-gradient
Seaquest 294940 50254 Human Dueling DDQN Q-gradient
Skiing -3272 -6522 Human Vanilla GA Genetic Policy
Space Invaders 43710 23864 Human A3C Policy-gradient 1B
Star Gunner 77400 164766 Machine A3C Policy-gradient N
Time Pilot 34400 27202 Human A3C Policy-gradient
Tutankham 2026 280 Human ACER Policy-gradient
Venture 38900 1107 Human Distribution DQN Q-gradient N
Video Pinball 3523988 533936 Human Rainbow Q-gradient 1B
Wizard of Wor 129500 18082 Human A3C Policy-gradient
Yars Revenge 2011099 102557 Human Rainbow Q-gradient ++
Zaxxon 83700 24622 Human A3C Q-gradient
  • N NTSC, no emulator results available
  • J Score from jvgs.net
  • E Game is so easy there's no world record category
  • 1B Game 1, Difficulty B
  • * Game 6, Difficulty B
  • + Game 7, Difficulty B
  • ** Game 1, Points
  • ++ Game 2, Difficulty A

What the point of this?

I decided to put this together after noticing two trends in reinforcement learning papers:

  • Not comparing to state of art.
  • Comparing an algorithm with 1000s of hours playtime to a human that played for a few hours.

Respectively, these make it hard to see the relative progress of the field from paper to paper, and the absolute progress compared to human level game playing.

Though RL papers routinely quote >100% normalized human performance, the reality is that machine learning algorithms just barely beat humans on only 5 out of 49 games here, and humans have a substantial lead in the rest. We have a long way to go.

Performance Among Machines

When we exclude human scores, per-algorithm win count are as follows (two way ties friendly, three or more unfriendly):

Algorithm Type Wins
Rainbow Q-gradient 18
A3C (FF and LSTM) Policy-gradient 11
Dueling DDQN Q-gradient 6
HyperNEAT Genetic Policy 4
Distribution DQN Q-gradient 3
Prioritized DDQN Q-gradient 3
ACER Policy-gradient 2
PPO Policy-gradient 1
Vanilla GA Genetic Policy 1
Noisy DQN Q-gradient 0
Vanilla ES Genetic Policy 0

Methodology

Human Scores

Since the ALE uses the stella Atari emulator, the Top Human Score is the top human score on an emulator. Atari (and other game) releases tend to vary across region, so this is the only way to ensure that both human and machine have, for example, equal access to game breaking bugs.

If possible, scores are taken from Twin Galaxies, which is the Guiness source for game world records, otherwise links are provided to score sources.

Machine Scores

A valid machine score is one achieved by a reinforcement learning algorithm trained directly on pixels and raw rewards, such as one that can be trained against common ALE wrappers / forks, like gym or xitari. This means that algorithms like this one which use hand-engineered intermediate rewards do not qualify.

Reference papers vary in:

  • Start type (no-op, random-op, human-op)
  • Number of test trials (from 30-200)

I take the approach here of favouring no-op starts over random ones (they usually have higher scores anyway), and treating all sample sizes equally.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].