All Projects → dylandjian → Supergo

dylandjian / Supergo

A student implementation of Alpha Go Zero

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Supergo

Pilco
Bayesian Reinforcement Learning in Tensorflow
Stars: ✭ 222 (-5.93%)
Mutual labels:  reinforcement-learning
Applied Reinforcement Learning
Reinforcement Learning and Decision Making tutorials explained at an intuitive level and with Jupyter Notebooks
Stars: ✭ 229 (-2.97%)
Mutual labels:  reinforcement-learning
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (-1.27%)
Mutual labels:  reinforcement-learning
Ddpg Aigym
Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments
Stars: ✭ 225 (-4.66%)
Mutual labels:  reinforcement-learning
Gam
A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).
Stars: ✭ 227 (-3.81%)
Mutual labels:  reinforcement-learning
Data Science Free
Free Resources For Data Science created by Shubham Kumar
Stars: ✭ 232 (-1.69%)
Mutual labels:  reinforcement-learning
Gold
Reinforcement Learning in Go
Stars: ✭ 215 (-8.9%)
Mutual labels:  reinforcement-learning
Aleph star
Reinforcement learning with A* and a deep heuristic
Stars: ✭ 235 (-0.42%)
Mutual labels:  reinforcement-learning
Evostra
A fast Evolution Strategy implementation in Python
Stars: ✭ 227 (-3.81%)
Mutual labels:  reinforcement-learning
Machine Learning Uiuc
🖥️ CS446: Machine Learning in Spring 2018, University of Illinois at Urbana-Champaign
Stars: ✭ 233 (-1.27%)
Mutual labels:  reinforcement-learning
Catalyst
Accelerated deep learning R&D
Stars: ✭ 2,804 (+1088.14%)
Mutual labels:  reinforcement-learning
Deeprl Grounding
Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)
Stars: ✭ 226 (-4.24%)
Mutual labels:  reinforcement-learning
Rl Agents
Implementations of Reinforcement Learning and Planning algorithms
Stars: ✭ 232 (-1.69%)
Mutual labels:  reinforcement-learning
Machine Learning Notebooks
Machine Learning notebooks for refreshing concepts.
Stars: ✭ 222 (-5.93%)
Mutual labels:  reinforcement-learning
Awesome Real World Rl
Great resources for making Reinforcement Learning work in Real Life situations. Papers,projects and more.
Stars: ✭ 234 (-0.85%)
Mutual labels:  reinforcement-learning
Ns3 Gym
ns3-gym - The Playground for Reinforcement Learning in Networking Research
Stars: ✭ 221 (-6.36%)
Mutual labels:  reinforcement-learning
Deep Rl Trading
playing idealized trading games with deep reinforcement learning
Stars: ✭ 228 (-3.39%)
Mutual labels:  reinforcement-learning
Learning To Communicate Pytorch
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch
Stars: ✭ 236 (+0%)
Mutual labels:  reinforcement-learning
Rl learn
我的强化学习笔记和学习材料📖 still updating ... ...
Stars: ✭ 234 (-0.85%)
Mutual labels:  reinforcement-learning
Nn
🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+2323.73%)
Mutual labels:  reinforcement-learning

SuperGo

A student implementation of AlphaGo Zero paper with documentation.

Ongoing project.

TODO (in order of priority)

  • Do something about the process leaking
  • File of constants that match the paper constants ?
  • OGS / KGS API ?
  • Use logging instead of prints ?

CURRENTLY DOING

  • Optimizations
  • Clean code, create install script, write documentation
  • Trying to see if it learns something on my computer

DONE

  • Statistics (branch statistics)
  • Game that are longer than the threshold of moves are now used
  • MCTS
    • Tree search
    • Dirichlet noise to prior probabilities in the rootnode
    • Adaptative temperature (either take max or proportionally)
    • Sample random rotation or reflection in the dihedral group
    • Multithreading of search
    • Batch size evaluation to save computation
  • Dihedral group of board for more training samples
  • Learning without MCTS doesnt seem to work
  • Resume training
  • GTP on trained models (human.py, to plug with Sabaki)
  • Learning rate annealing (see this)
  • Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
  • Make the 3 components (self-play, training, evaluation) asynchronous
  • Multiprocessing of games for self-play and evaluation
  • Models and training without MCTS
  • Evaluation
  • Tromp Taylor scoring
  • Dataset ring buffer of self-play games
  • Loading saved models
  • Database for self-play games

LONG TERM PLAN ?

  • Compile my own version of Sabaki to watch games automatically while traning
  • Resignation ?
  • Training on a big computer / server once everything is ready ?

Resources

Statistics, check branch stats

For a 10 layers deep Resnet

9x9 board

soon

19x19 board

Differences with the official paper

  • No resignation
  • PyTorch instead of Tensorflow
  • Python instead of (probably) C++ / C
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].