dylandjian / Supergo
A student implementation of Alpha Go Zero
Stars: ✭ 236
Programming Languages
Projects that are alternatives of or similar to Supergo
Pilco
Bayesian Reinforcement Learning in Tensorflow
Stars: ✭ 222 (-5.93%)
Mutual labels: reinforcement-learning
Applied Reinforcement Learning
Reinforcement Learning and Decision Making tutorials explained at an intuitive level and with Jupyter Notebooks
Stars: ✭ 229 (-2.97%)
Mutual labels: reinforcement-learning
Pytorch Drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
Stars: ✭ 233 (-1.27%)
Mutual labels: reinforcement-learning
Ddpg Aigym
Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments
Stars: ✭ 225 (-4.66%)
Mutual labels: reinforcement-learning
Gam
A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).
Stars: ✭ 227 (-3.81%)
Mutual labels: reinforcement-learning
Data Science Free
Free Resources For Data Science created by Shubham Kumar
Stars: ✭ 232 (-1.69%)
Mutual labels: reinforcement-learning
Aleph star
Reinforcement learning with A* and a deep heuristic
Stars: ✭ 235 (-0.42%)
Mutual labels: reinforcement-learning
Evostra
A fast Evolution Strategy implementation in Python
Stars: ✭ 227 (-3.81%)
Mutual labels: reinforcement-learning
Machine Learning Uiuc
🖥️ CS446: Machine Learning in Spring 2018, University of Illinois at Urbana-Champaign
Stars: ✭ 233 (-1.27%)
Mutual labels: reinforcement-learning
Catalyst
Accelerated deep learning R&D
Stars: ✭ 2,804 (+1088.14%)
Mutual labels: reinforcement-learning
Deeprl Grounding
Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)
Stars: ✭ 226 (-4.24%)
Mutual labels: reinforcement-learning
Rl Agents
Implementations of Reinforcement Learning and Planning algorithms
Stars: ✭ 232 (-1.69%)
Mutual labels: reinforcement-learning
Machine Learning Notebooks
Machine Learning notebooks for refreshing concepts.
Stars: ✭ 222 (-5.93%)
Mutual labels: reinforcement-learning
Awesome Real World Rl
Great resources for making Reinforcement Learning work in Real Life situations. Papers,projects and more.
Stars: ✭ 234 (-0.85%)
Mutual labels: reinforcement-learning
Ns3 Gym
ns3-gym - The Playground for Reinforcement Learning in Networking Research
Stars: ✭ 221 (-6.36%)
Mutual labels: reinforcement-learning
Deep Rl Trading
playing idealized trading games with deep reinforcement learning
Stars: ✭ 228 (-3.39%)
Mutual labels: reinforcement-learning
Learning To Communicate Pytorch
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch
Stars: ✭ 236 (+0%)
Mutual labels: reinforcement-learning
Rl learn
我的强化学习笔记和学习材料📖 still updating ... ...
Stars: ✭ 234 (-0.85%)
Mutual labels: reinforcement-learning
Nn
🧑🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Stars: ✭ 5,720 (+2323.73%)
Mutual labels: reinforcement-learning
SuperGo
A student implementation of AlphaGo Zero paper with documentation.
Ongoing project.
TODO (in order of priority)
- Do something about the process leaking
- File of constants that match the paper constants ?
- OGS / KGS API ?
- Use logging instead of prints ?
CURRENTLY DOING
- Optimizations
- Clean code, create install script, write documentation
- Trying to see if it learns something on my computer
DONE
- Statistics (branch statistics)
- Game that are longer than the threshold of moves are now used
- MCTS
- Tree search
- Dirichlet noise to prior probabilities in the rootnode
- Adaptative temperature (either take max or proportionally)
- Sample random rotation or reflection in the dihedral group
- Multithreading of search
- Batch size evaluation to save computation
- Dihedral group of board for more training samples
- Learning without MCTS doesnt seem to work
- Resume training
- GTP on trained models (human.py, to plug with Sabaki)
- Learning rate annealing (see this)
- Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
- Make the 3 components (self-play, training, evaluation) asynchronous
- Multiprocessing of games for self-play and evaluation
- Models and training without MCTS
- Evaluation
- Tromp Taylor scoring
- Dataset ring buffer of self-play games
- Loading saved models
- Database for self-play games
LONG TERM PLAN ?
- Compile my own version of Sabaki to watch games automatically while traning
- Resignation ?
- Training on a big computer / server once everything is ready ?
Resources
- The article for this code
- Official AlphaGo Zero paper
- Custom environment implementation using pachi_py following the implementation that was originally made on OpenAI Gym
- Using PyTorch for the neural networks
- Using Sabaki for the GUI
- General scheme, cool design
- Monte Carlo tree search explaination
- Nice tree search implementation
Statistics, check branch stats
For a 10 layers deep Resnet
9x9 board
soon
19x19 board
Differences with the official paper
- No resignation
- PyTorch instead of Tensorflow
- Python instead of (probably) C++ / C
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].