Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → dylandjian → Supergo

dylandjian / Supergo

A student implementation of Alpha Go Zero

Programming Languages

139335 projects - #7 most used programming language

1442 projects

Labels

machine-learning pytorch reinforcement-learning

Projects that are alternatives of or similar to Supergo

Bayesian Reinforcement Learning in Tensorflow

Stars: ✭ 222 (-5.93%)

Mutual labels: reinforcement-learning

Applied Reinforcement Learning

Reinforcement Learning and Decision Making tutorials explained at an intuitive level and with Jupyter Notebooks

Stars: ✭ 229 (-2.97%)

Mutual labels: reinforcement-learning

PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.

Stars: ✭ 233 (-1.27%)

Mutual labels: reinforcement-learning

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments

Stars: ✭ 225 (-4.66%)

Mutual labels: reinforcement-learning

A PyTorch implementation of "Graph Classification Using Structural Attention" (KDD 2018).

Stars: ✭ 227 (-3.81%)

Mutual labels: reinforcement-learning

Data Science Free

Free Resources For Data Science created by Shubham Kumar

Stars: ✭ 232 (-1.69%)

Mutual labels: reinforcement-learning

Reinforcement Learning in Go

Stars: ✭ 215 (-8.9%)

Mutual labels: reinforcement-learning

Reinforcement learning with A* and a deep heuristic

Stars: ✭ 235 (-0.42%)

Mutual labels: reinforcement-learning

A fast Evolution Strategy implementation in Python

Stars: ✭ 227 (-3.81%)

Mutual labels: reinforcement-learning

Machine Learning Uiuc

🖥️ CS446: Machine Learning in Spring 2018, University of Illinois at Urbana-Champaign

Stars: ✭ 233 (-1.27%)

Mutual labels: reinforcement-learning

Accelerated deep learning R&D

Stars: ✭ 2,804 (+1088.14%)

Mutual labels: reinforcement-learning

Deeprl Grounding

Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Stars: ✭ 226 (-4.24%)

Mutual labels: reinforcement-learning

Implementations of Reinforcement Learning and Planning algorithms

Stars: ✭ 232 (-1.69%)

Mutual labels: reinforcement-learning

Machine Learning Notebooks

Machine Learning notebooks for refreshing concepts.

Stars: ✭ 222 (-5.93%)

Mutual labels: reinforcement-learning

Awesome Real World Rl

Great resources for making Reinforcement Learning work in Real Life situations. Papers,projects and more.

Stars: ✭ 234 (-0.85%)

Mutual labels: reinforcement-learning

ns3-gym - The Playground for Reinforcement Learning in Networking Research

Stars: ✭ 221 (-6.36%)

Mutual labels: reinforcement-learning

Deep Rl Trading

playing idealized trading games with deep reinforcement learning

Stars: ✭ 228 (-3.39%)

Mutual labels: reinforcement-learning

Learning To Communicate Pytorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Stars: ✭ 236 (+0%)

Mutual labels: reinforcement-learning

我的强化学习笔记和学习材料📖 still updating ... ...

Stars: ✭ 234 (-0.85%)

Mutual labels: reinforcement-learning

🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Stars: ✭ 5,720 (+2323.73%)

Mutual labels: reinforcement-learning

View All Similar Projects ➔

SuperGo

A student implementation of AlphaGo Zero paper with documentation.

Ongoing project.

TODO (in order of priority)

Do something about the process leaking
File of constants that match the paper constants ?
OGS / KGS API ?
Use logging instead of prints ?

CURRENTLY DOING

Optimizations
Clean code, create install script, write documentation
Trying to see if it learns something on my computer

DONE

Statistics (branch statistics)
Game that are longer than the threshold of moves are now used
MCTS
- Tree search
- Dirichlet noise to prior probabilities in the rootnode
- Adaptative temperature (either take max or proportionally)
- Sample random rotation or reflection in the dihedral group
- Multithreading of search
- Batch size evaluation to save computation
Dihedral group of board for more training samples
Learning without MCTS doesnt seem to work
Resume training
GTP on trained models (human.py, to plug with Sabaki)
Learning rate annealing (see this)
Better display for game (viewer.py, converting self-play games into GTP and then using Sabaki)
Make the 3 components (self-play, training, evaluation) asynchronous
Multiprocessing of games for self-play and evaluation
Models and training without MCTS
Evaluation
Tromp Taylor scoring
Dataset ring buffer of self-play games
Loading saved models
Database for self-play games

LONG TERM PLAN ?

Compile my own version of Sabaki to watch games automatically while traning
Resignation ?
Training on a big computer / server once everything is ready ?

Resources

The article for this code
Official AlphaGo Zero paper
Custom environment implementation using pachi_py following the implementation that was originally made on OpenAI Gym
Using PyTorch for the neural networks
Using Sabaki for the GUI
General scheme, cool design
Monte Carlo tree search explaination
Nice tree search implementation

Statistics, check branch stats

For a 10 layers deep Resnet

9x9 board

soon

19x19 board

Differences with the official paper

No resignation
PyTorch instead of Tensorflow
Python instead of (probably) C++ / C

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 236

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗