All Projects → LyWangPX → Reinforcement Learning 2nd Edition By Sutton Exercise Solutions

LyWangPX / Reinforcement Learning 2nd Edition By Sutton Exercise Solutions

Licence: mit
Solutions of Reinforcement Learning, An Introduction

Projects that are alternatives of or similar to Reinforcement Learning 2nd Edition By Sutton Exercise Solutions

Learning Deep Learning
Paper reading notes on Deep Learning and Machine Learning
Stars: ✭ 388 (-45.58%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (-36.33%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Qlearning trading
Learning to trade under the reinforcement learning framework
Stars: ✭ 431 (-39.55%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Qtrader
Reinforcement Learning for Portfolio Management
Stars: ✭ 363 (-49.09%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Ml Mipt
Open Machine Learning course at MIPT
Stars: ✭ 480 (-32.68%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Lagom
lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
Stars: ✭ 364 (-48.95%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Tensor House
A collection of reference machine learning and optimization models for enterprise operations: marketing, pricing, supply chain
Stars: ✭ 449 (-37.03%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Youtube Code Repository
Repository for most of the code from my YouTube channel
Stars: ✭ 317 (-55.54%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Tensorflow Book
Accompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.
Stars: ✭ 4,448 (+523.84%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Practical rl
A course in reinforcement learning in the wild
Stars: ✭ 4,741 (+564.94%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Text summurization abstractive methods
Multiple implementations for abstractive text summurization , using google colab
Stars: ✭ 359 (-49.65%)
Mutual labels:  jupyter-notebook, reinforcement-learning
David Silver Reinforcement Learning
Notes for the Reinforcement Learning course by David Silver along with implementation of various algorithms.
Stars: ✭ 623 (-12.62%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Meta Rl
Implementation of Meta-RL A3C algorithm
Stars: ✭ 355 (-50.21%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Deep Reinforcement Learning
Repo for the Deep Reinforcement Learning Nanodegree program
Stars: ✭ 4,012 (+462.69%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Trpo
Trust Region Policy Optimization with TensorFlow and OpenAI Gym
Stars: ✭ 343 (-51.89%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (-38.01%)
Mutual labels:  jupyter-notebook, reinforcement-learning
A3c trading
Trading with recurrent actor-critic reinforcement learning
Stars: ✭ 305 (-57.22%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Reinforcement Learning
Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning
Stars: ✭ 3,329 (+366.9%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Rl Book
Source codes for the book "Reinforcement Learning: Theory and Python Implementation"
Stars: ✭ 464 (-34.92%)
Mutual labels:  jupyter-notebook, reinforcement-learning
Amazon Sagemaker Examples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Stars: ✭ 6,346 (+790.04%)
Mutual labels:  jupyter-notebook, reinforcement-learning

Solutions of Reinforcement Learning 2nd Edition (Original Book by Richard S. Sutton,Andrew G. Barto)

Chapter 12 Updated. See Log below for detail.

Those students who are using this to complete your homework, stop it. This is written for serving millions of self-learners who do not have official guide or proper learning environment. And, Of Course, as a personal project, it has ERRORS. (Contribute to issues if you find any).

Welcome to this project. It is a tiny project where we don't do too much coding (yet) but we cooperate together to finish some tricky exercises from famous RL book Reinforcement Learning, An Introduction by Sutton. You may know that this book, especially the second version which was published last year, has no official solution manual. If you send your answer to the email address that the author leaved, you will be returned a fake answer sheet that is incomplete and old. So, why don't we write our own? Most of problems are mathematical proof in which one can learn the therotical backbone nicely but some of them are quite challenging coding problems. Both of them will be updated gradually but math will go first.

Main author would be me and current main cooperater is Jean Wissam Dupin, and before was Zhiqi Pan (quitted now).

Main Contributers for Error Fixing:

burmecia's Work (Error Fix and code contribution)

Chapter 3: Ex 3.4, 3.5, 3.6, 3.9, 3.19

Chapter4: Ex 4.7 Code(in Julia)

Jean's Work (Error Fix):

Chapter 3: Ex 3.8, 3.11, 3.14, 3.23, 3.24, 3.26, 3.28, 3.29, 4.5

QihuaZhong's Work (Error fix, analysis)

Ex 6.11, 5.11, 10.5, 10.6

luigift's Work (Error fix, algorithm contribution)

Ex 10.4 10.6 10.7 Ex 12.1 (alternative solution)

Other people (Error Fix):

Ex 10.2 SHITIANYU-hue Ex 10.6 10.7 Mohammad Salehi

ABOUT MISTAKES:

Don't even expect the solutions be perfect, there are always mistakes. Especially in Chapter 3, where my mind was in a rush there. And, sometimes the problems are just open. Show your ideas and question them in 'issues' at any time!

Let's roll'n out!

UPDATE LOG:

Will update and revise this repo after 2021 April

[UPDATE APRIL 2020] After implementing Ape-X and D4PG in my another project, I will go back to this project and at least finish the policy gradient chapter.

[UPDATE MAR 2020] Chapter 12 almost finished and is updated, except for the last 2 questions. One for dutch trace and one for double expected SARSA. They are tricker than other exercises and I will update them little bit later. Please share your ideas by opening issues if you already hold a valid solution.**

[UPDATE MAR 2020] Due to multiple interviews ( it is interview season in japan ( despite the virus!)), I have to postpone the plan of update to March or later, depending how far I could go. (That means I am doing leetcode-ish stuff every day)

[UPDATE JAN 2020] Future works will NOT be stopped. I will try to finish it in FEB 2020.

[UPDATE JAN 2020] Chapter 12's ideas are not so hard but questions are very difficult. (most chanllenging one in this book ). As far, I have finished up to Ex 12.5 and I think my answer of Ex 12.1 is the only valid one on the internet (or not, challenge welcomed!) But because later half is even more challenging (tedious when it is related to many infiite sums), I would release the final version little bit later.

[UPDATE JAN 2020] Chapter 11 updated. One might have to read the referenced link to Sutton's paper in order to understand some part. Espeically how and why Emphatic-TD works.

[UPDATE JAN 2020] Chapter 10 is long but interesting! Move on!

[UPDATE DEC 2019] Chapter 9 takes long time to read thoroughly but practices are surprisingly just a few. So after uploading the Chapter 9 pdf and I really do think I should go back to previous chapters to complete those programming practices.

Chapter 12

[Updated March 27] Almost finished.

CHAPTER 12 SOLUTION PDF HERE

Chapter 11

Major challenges about off-policy learning. Like Chapter 9, practices are short.

CHAPTER 11 SOLUTION PDF HERE

Chapter 10

It is a substantial complement to Chapter 9. Still many open problems which are very interesting.

CHAPTER 10 SOLUTION PDF HERE

Chapter 9

Long chapter, short practices.

CHAPTER 9 SOLUTION PDF HERE

Chapter 8

Finished without programming. Plan on creating additional exercises to this Chapter because many materials are lack of practice.

CHAPTER 8 SOLUTION PDF HERE

Chapter 7

Finished without programming. Thanks for help from Zhiqi Pan.

CHAPTER 7 SOLUTION PDF HERE

Chapter 6

Fully finished.

CHAPTER 6 SOLUTION PDF HERE

Chapter 5

Partially finished.

CHAPTER 5 SOLUTION PDF HERE

Chapter 4

Finished. Ex4.7 Partially finished. Dat DP question will burn my mind and macbook but I encourage any one who cares nothing about that trying to do yourself. Running through it forces you remember everything behind ordinary DP.:)

CHAPTER 4 SOLUTION PDF HERE

Chapter 3 (I was in a rush in this chapter. Be aware about strange answers if any.)

CHAPTER 3 SOLUTION PDF HERE

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].