Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → mtrazzi → Rl Book Challenge

mtrazzi / Rl Book Challenge

Licence: mit

self-studying the Sutton & Barto the hard way

Programming Languages

python

139335 projects - #7 most used programming language

Labels

reinforcement-learning matplotlib anki

Projects that are alternatives of or similar to Rl Book Challenge

Policy Gradient

Minimal Monte Carlo Policy Gradient (REINFORCE) Algorithm Implementation in Keras

Stars: ✭ 135 (-7.53%)

Mutual labels: reinforcement-learning

Awesome Deep Learning Papers For Search Recommendation Advertising

Awesome Deep Learning papers for industrial Search, Recommendation and Advertising. They focus on Embedding, Matching, Ranking (CTR prediction, CVR prediction), Post Ranking, Transfer, Reinforcement Learning, Self-supervised Learning and so on.

Stars: ✭ 136 (-6.85%)

Mutual labels: reinforcement-learning

Allenact

An open source framework for research in Embodied-AI from AI2.

Stars: ✭ 144 (-1.37%)

Mutual labels: reinforcement-learning

Ai plays snake

AI trained using Genetic Algorithm and Deep Learning to play the game of snake

Stars: ✭ 137 (-6.16%)

Mutual labels: reinforcement-learning

Safe learning

Safe reinforcement learning with stability guarantees

Stars: ✭ 140 (-4.11%)

Mutual labels: reinforcement-learning

Data Analysis

主要是爬虫与数据分析项目总结，外加建模与机器学习，模型的评估。

Stars: ✭ 142 (-2.74%)

Mutual labels: matplotlib

Reinforcement learning in python

Implementing Reinforcement Learning, namely Q-learning and Sarsa algorithms, for global path planning of mobile robot in unknown environment with obstacles. Comparison analysis of Q-learning and Sarsa

Stars: ✭ 134 (-8.22%)

Mutual labels: reinforcement-learning

Sumo Rl

A simple interface to instantiate Reinforcement Learning environments with SUMO for Traffic Signal Control. Compatible with Gym Env from OpenAI and MultiAgentEnv from RLlib.

Stars: ✭ 145 (-0.68%)

Mutual labels: reinforcement-learning

Flappy Es

Flappy Bird AI using Evolution Strategies

Stars: ✭ 140 (-4.11%)

Mutual labels: reinforcement-learning

Data Science Question Answer

A repo for data science related questions and answers

Stars: ✭ 2,000 (+1269.86%)

Mutual labels: reinforcement-learning

Zhihu Spider

一个获取知乎用户主页信息的多线程Python爬虫程序。

Stars: ✭ 137 (-6.16%)

Mutual labels: matplotlib

Complete Life Cycle Of A Data Science Project

Complete-Life-Cycle-of-a-Data-Science-Project

Stars: ✭ 140 (-4.11%)

Mutual labels: reinforcement-learning

Cherry

A PyTorch Library for Reinforcement Learning Research

Stars: ✭ 143 (-2.05%)

Mutual labels: reinforcement-learning

Savn

Learning to Learn how to Learn: Self-Adaptive Visual Navigation using Meta-Learning (https://arxiv.org/abs/1812.00971)

Stars: ✭ 135 (-7.53%)

Mutual labels: reinforcement-learning

Machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Stars: ✭ 145 (-0.68%)

Mutual labels: reinforcement-learning

Ml Cheatsheet

A constantly updated python machine learning cheatsheet

Stars: ✭ 136 (-6.85%)

Mutual labels: matplotlib

Deep Learning Resources

A Collection of resources I have found useful on my journey finding my way through the world of Deep Learning.

Stars: ✭ 141 (-3.42%)

Mutual labels: matplotlib

Tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Stars: ✭ 11,865 (+8026.71%)

Mutual labels: reinforcement-learning

Articulations Robot Demo

Stars: ✭ 145 (-0.68%)

Mutual labels: reinforcement-learning

A minimal C implementation of Nintendo Gameboy - An fast research environment for Reinforcement Learning

Stars: ✭ 143 (-2.05%)

Mutual labels: reinforcement-learning

View All Similar Projects ➔

In this repo

Python replication of all the plots from Reinforcement Learning: An Introduction
Solution for all of the exercises
Anki flashcards summary of the book

1. Replicate all the figures

To reproduce a figure, say figure 2.2, do:

cd chapter2
python figures.py 2.2

Chapter 2

Chapter 4

Figure 4.2: Jack’s car rental problem (value function, policy)
Figure 4.3: The solution to the gambler’s problem (value function, policy)

Chapter 5

Figure 5.1: Approximate state-value functions for the blackjack policy
Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES
Figure 5.3: Weighted importance sampling
Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates
Figure 5.5: A couple of right turns for the racetrack task (1, 2, 3)

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Figure 10.1: The cost-to-go function for Mountain Car task in one run (428 steps; 12, 104, 1000, 9000 episodes)
Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task
Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task
Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa
Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task

Chapter 11

Chapter 12

Chapter 13

2. Solution to all of the exercises (text answers)

To reproduce the results of an exercise, say exercise 2.5 do:

cd chapter2
python figures.py ex2.5

Chapter 2

Chapter 4

Exercise 4.7: Modified Jack's car rental problem (value function, policy)
Exercise 4.9: Gambler’s problem with ph = 0.25 (value function, policy) and ph = 0.55 (value function, policy)

Chapter 5

Exercise 5.14: Modified MC Control on the racetrack (1, 2)

Chapter 6

Chapter 7

Chapter 8

Chapter 11

Exercise11.3: One-step semi-gradient Q-learning to Baird’s counterexample

3. Anki flashcards (cf. this blog)

Appendix

Dependencies

numpy
matplotlib
seaborn

Credits

All of the code and answers are mine, except for mountain car's tile coding (url in the book).

This README is inspired from ShangtongZhang's repo.

Design choices

All of the chapters are self-contained.
The environments use a gym-like API with methods:

s = env.reset()
s_p, r, d, dict = env.step(a)

How long did it take

The entire thing (plots, exercises, anki cards (including reviewing)) took about 400h of focused work.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 146

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗