All Projects → richemslie → Galvanise_zero

richemslie / Galvanise_zero

Licence: mit
Learning from zero (mostly based off of AlphaZero) in General Game Playing.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Galvanise zero

Chess Alpha Zero
Chess reinforcement learning by AlphaGo Zero methods.
Stars: ✭ 1,868 (+3013.33%)
Mutual labels:  chess, reinforcement-learning
Nlg Rl
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
Stars: ✭ 59 (-1.67%)
Mutual labels:  reinforcement-learning
Gym Minigrid
Minimalistic gridworld package for OpenAI Gym
Stars: ✭ 1,047 (+1645%)
Mutual labels:  reinforcement-learning
Reinforcement Learning
Implementation of Reinforcement Learning algorithms in Python, based on Sutton's & Barto's Book (Ed. 2)
Stars: ✭ 55 (-8.33%)
Mutual labels:  reinforcement-learning
Policy Gradient Methods
Implementation of Algorithms from the Policy Gradient Family. Currently includes: A2C, A3C, DDPG, TD3, SAC
Stars: ✭ 54 (-10%)
Mutual labels:  reinforcement-learning
Tictactoe
Tic Tac Toe Machine Learning
Stars: ✭ 56 (-6.67%)
Mutual labels:  reinforcement-learning
Holodeck Engine
High Fidelity Simulator for Reinforcement Learning and Robotics Research.
Stars: ✭ 48 (-20%)
Mutual labels:  reinforcement-learning
Data Science Best Resources
Carefully curated resource links for data science in one place
Stars: ✭ 1,104 (+1740%)
Mutual labels:  reinforcement-learning
Malib
A Multi-agent Learning Framework
Stars: ✭ 58 (-3.33%)
Mutual labels:  reinforcement-learning
Demos
Some JavaScript works published as demos, mostly ML or DS
Stars: ✭ 55 (-8.33%)
Mutual labels:  reinforcement-learning
Reinforcepy
Collection of reinforcement learners implemented in python. Mainly including DQN and its variants
Stars: ✭ 54 (-10%)
Mutual labels:  reinforcement-learning
Notebooks
Some notebooks
Stars: ✭ 53 (-11.67%)
Mutual labels:  reinforcement-learning
Taxorl
Code for paper "End-to-End Reinforcement Learning for Automatic Taxonomy Induction", ACL 2018
Stars: ✭ 57 (-5%)
Mutual labels:  reinforcement-learning
Pytorch Rl
Stars: ✭ 52 (-13.33%)
Mutual labels:  reinforcement-learning
Pgdrive
PGDrive: an open-ended driving simulator with infinite scenes from procedural generation
Stars: ✭ 60 (+0%)
Mutual labels:  reinforcement-learning
Lichobile
lichess.org mobile application
Stars: ✭ 1,043 (+1638.33%)
Mutual labels:  chess
Dqn
Implementation of q-learning using TensorFlow
Stars: ✭ 53 (-11.67%)
Mutual labels:  reinforcement-learning
Ml Surveys
📋 Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.
Stars: ✭ 1,063 (+1671.67%)
Mutual labels:  reinforcement-learning
Mario rl
Stars: ✭ 60 (+0%)
Mutual labels:  reinforcement-learning
Nlp overview
Overview of Modern Deep Learning Techniques Applied to Natural Language Processing
Stars: ✭ 1,104 (+1740%)
Mutual labels:  reinforcement-learning
  • What? galvanise is a [[https://en.wikipedia.org/wiki/General_game_playing][General Game Player]], where games are written in [[https://en.wikipedia.org/wiki/Game_Description_Language][GDL]]. The original galvanise code was converted to a library [[https://github.com/richemslie/ggplib][ggplib]] and galvanise_zero adds AlphaZero style learning. Much inspiration was from Deepmind's related papers, and the excellent Expert Iteration [[https://arxiv.org/abs/1705.08439][paper]]. A number of Alpha*Zero open source projects were also inspirational: LeelaZero and KataGo (XXX add links).

  • Features

    • there is no game specific code other than the GDL description of the games, a high level python configuration file describing GDL symbols to state mapping and symmetries (see [[https://github.com/richemslie/galvanise_zero/issues/1][here]] for more information).
    • multiple policies - train assymetric games
    • fully automated, put in oven and strong model is baked
    • network replaced during self play games
    • training is very fast using proper coroutines at the C level. 1000s of concurrent games are trained using large batch sizes on GPU (for small networks).
    • uses a post processed replay buffer, which uses the excellent bcolz (XXX link) project. Training can allow arbitrary sampling from the buffer (giving emphasis to most recent data).
    • initially project used expert iteration. This was deprecated in favour of oscillating sampling (similar to KataGo).
    • 3 value heads for games with draws
  • Training

    • used same setting for training all games types (cpuct 0.85, fpu 0.25).
    • uses smaller number of evaluations (200) than A0, oscillating sampling during training (75% of moves are skipped, using much less evals to do so).
    • policy squashing and extra noise to prevent overfitting
    • models use dropout, global average pooling and squeeze excite blocks (these are optional)
    • in general, takes 3-5 days in many of the trained game types below to become super human strength

    See [[http://littlegolem.net/jsp/info/player.jsp?plid=58835][gzero_bot]] for how to play on Little Golem.

  • Status Games with significant training, links to elo graphs and models:

    • [[https://github.com/richemslie/gzero_data/tree/master/data/chess][chess]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/connect6][connect6]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/hexLG13][hex13]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/reversi_10x10][reversi10]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/reversi_8x8][reversi8]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/amazons_10x10][amazons]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/breakthrough][breakthrough]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/hexLG11][hex11]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/draughts_killer][International Draughts (killer mode)]]
    • [[https://github.com/richemslie/gzero_data/tree/master/data/hex19][hex19]]

    Little Golem Champion in last attempts @ Connect6, Hex13, Amazons and Breakthrough, winning all matches. Retired from further Championships. Connect6 and Hex 13 are currently rated 1st and 2nd respectively on active users.

    Amazons and Breakthrough won gold medals at ICGA 2018 Computer Olympiad. 👏 👏

    Reversi is also strong relative to humans on LG, yet performs a bit worse than top AB programs (about ntest level 20 the last time I tested).

    Also trained Baduk 9x9, it had a rating ~2900 elo on CGOS after 2-3 week of training.

  • Running The code is in fairly good shape, but could do with some refactoring and documentation (especially a how to guide on how to train a game). It would definitely be good to have an extra pair of eyes on it. I'll welcome and support anyone willing to try training a game for themselves. Some notes:

    1. python is 2.7
    2. requires a GPU/tensorflow
    3. good starting point is https://github.com/richemslie/ggp-zero/blob/dev/src/ggpzero/defs

    How to run and install instruction coming soon!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].