Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → LyWangPX → Reinforcement Learning 2nd Edition By Sutton Exercise Solutions

LyWangPX / Reinforcement Learning 2nd Edition By Sutton Exercise Solutions

Licence: mit

Solutions of Reinforcement Learning, An Introduction

Labels

jupyter-notebook reinforcement-learning

Projects that are alternatives of or similar to Reinforcement Learning 2nd Edition By Sutton Exercise Solutions

Learning Deep Learning

Paper reading notes on Deep Learning and Machine Learning

Stars: ✭ 388 (-45.58%)

Mutual labels: jupyter-notebook, reinforcement-learning

Quiz & Assignment of Coursera

Stars: ✭ 454 (-36.33%)

Mutual labels: jupyter-notebook, reinforcement-learning

Qlearning trading

Learning to trade under the reinforcement learning framework

Stars: ✭ 431 (-39.55%)

Mutual labels: jupyter-notebook, reinforcement-learning

Reinforcement Learning for Portfolio Management

Stars: ✭ 363 (-49.09%)

Mutual labels: jupyter-notebook, reinforcement-learning

Open Machine Learning course at MIPT

Stars: ✭ 480 (-32.68%)

Mutual labels: jupyter-notebook, reinforcement-learning

lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.

Stars: ✭ 364 (-48.95%)

Mutual labels: jupyter-notebook, reinforcement-learning

A collection of reference machine learning and optimization models for enterprise operations: marketing, pricing, supply chain

Stars: ✭ 449 (-37.03%)

Mutual labels: jupyter-notebook, reinforcement-learning

Youtube Code Repository

Repository for most of the code from my YouTube channel

Stars: ✭ 317 (-55.54%)

Mutual labels: jupyter-notebook, reinforcement-learning

Tensorflow Book

Accompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.

Stars: ✭ 4,448 (+523.84%)

Mutual labels: jupyter-notebook, reinforcement-learning

A course in reinforcement learning in the wild

Stars: ✭ 4,741 (+564.94%)

Mutual labels: jupyter-notebook, reinforcement-learning

Text summurization abstractive methods

Multiple implementations for abstractive text summurization , using google colab

Stars: ✭ 359 (-49.65%)

Mutual labels: jupyter-notebook, reinforcement-learning

David Silver Reinforcement Learning

Notes for the Reinforcement Learning course by David Silver along with implementation of various algorithms.

Stars: ✭ 623 (-12.62%)

Mutual labels: jupyter-notebook, reinforcement-learning

Implementation of Meta-RL A3C algorithm

Stars: ✭ 355 (-50.21%)

Mutual labels: jupyter-notebook, reinforcement-learning

Deep Reinforcement Learning

Repo for the Deep Reinforcement Learning Nanodegree program

Stars: ✭ 4,012 (+462.69%)

Mutual labels: jupyter-notebook, reinforcement-learning

Trust Region Policy Optimization with TensorFlow and OpenAI Gym

Stars: ✭ 343 (-51.89%)

Mutual labels: jupyter-notebook, reinforcement-learning

Reinforcement learning tutorial with demo

Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..

Stars: ✭ 442 (-38.01%)

Mutual labels: jupyter-notebook, reinforcement-learning

Trading with recurrent actor-critic reinforcement learning

Stars: ✭ 305 (-57.22%)

Mutual labels: jupyter-notebook, reinforcement-learning

Reinforcement Learning

Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning

Stars: ✭ 3,329 (+366.9%)

Mutual labels: jupyter-notebook, reinforcement-learning

Source codes for the book "Reinforcement Learning: Theory and Python Implementation"

Stars: ✭ 464 (-34.92%)

Mutual labels: jupyter-notebook, reinforcement-learning

Amazon Sagemaker Examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Stars: ✭ 6,346 (+790.04%)

Mutual labels: jupyter-notebook, reinforcement-learning

View All Similar Projects ➔

Solutions of Reinforcement Learning 2nd Edition (Original Book by Richard S. Sutton,Andrew G. Barto)

Chapter 12 Updated. See Log below for detail.

Those students who are using this to complete your homework, stop it. This is written for serving millions of self-learners who do not have official guide or proper learning environment. And, Of Course, as a personal project, it has ERRORS. (Contribute to issues if you find any).

Welcome to this project. It is a tiny project where we don't do too much coding (yet) but we cooperate together to finish some tricky exercises from famous RL book Reinforcement Learning, An Introduction by Sutton. You may know that this book, especially the second version which was published last year, has no official solution manual. If you send your answer to the email address that the author leaved, you will be returned a fake answer sheet that is incomplete and old. So, why don't we write our own? Most of problems are mathematical proof in which one can learn the therotical backbone nicely but some of them are quite challenging coding problems. Both of them will be updated gradually but math will go first.

Main author would be me and current main cooperater is Jean Wissam Dupin, and before was Zhiqi Pan (quitted now).

Main Contributers for Error Fixing:

burmecia's Work (Error Fix and code contribution)

Chapter 3: Ex 3.4, 3.5, 3.6, 3.9, 3.19

Chapter4: Ex 4.7 Code(in Julia)

Jean's Work (Error Fix):

Chapter 3: Ex 3.8, 3.11, 3.14, 3.23, 3.24, 3.26, 3.28, 3.29, 4.5

QihuaZhong's Work (Error fix, analysis)

Ex 6.11, 5.11, 10.5, 10.6

luigift's Work (Error fix, algorithm contribution)

Ex 10.4 10.6 10.7 Ex 12.1 (alternative solution)

Other people (Error Fix):

Ex 10.2 SHITIANYU-hue Ex 10.6 10.7 Mohammad Salehi

ABOUT MISTAKES:

Don't even expect the solutions be perfect, there are always mistakes. Especially in Chapter 3, where my mind was in a rush there. And, sometimes the problems are just open. Show your ideas and question them in 'issues' at any time!

Let's roll'n out!

UPDATE LOG:

Will update and revise this repo after 2021 April

[UPDATE APRIL 2020] After implementing Ape-X and D4PG in my another project, I will go back to this project and at least finish the policy gradient chapter.

[UPDATE MAR 2020] Chapter 12 almost finished and is updated, except for the last 2 questions. One for dutch trace and one for double expected SARSA. They are tricker than other exercises and I will update them little bit later. Please share your ideas by opening issues if you already hold a valid solution.**

[UPDATE MAR 2020] Due to multiple interviews ( it is interview season in japan ( despite the virus!)), I have to postpone the plan of update to March or later, depending how far I could go. (That means I am doing leetcode-ish stuff every day)

[UPDATE JAN 2020] Future works will NOT be stopped. I will try to finish it in FEB 2020.

[UPDATE JAN 2020] Chapter 12's ideas are not so hard but questions are very difficult. (most chanllenging one in this book ). As far, I have finished up to Ex 12.5 and I think my answer of Ex 12.1 is the only valid one on the internet (or not, challenge welcomed!) But because later half is even more challenging (tedious when it is related to many infiite sums), I would release the final version little bit later.

[UPDATE JAN 2020] Chapter 11 updated. One might have to read the referenced link to Sutton's paper in order to understand some part. Espeically how and why Emphatic-TD works.

[UPDATE JAN 2020] Chapter 10 is long but interesting! Move on!

[UPDATE DEC 2019] Chapter 9 takes long time to read thoroughly but practices are surprisingly just a few. So after uploading the Chapter 9 pdf and I really do think I should go back to previous chapters to complete those programming practices.

Chapter 12

[Updated March 27] Almost finished.

CHAPTER 12 SOLUTION PDF HERE

Chapter 11

Major challenges about off-policy learning. Like Chapter 9, practices are short.

CHAPTER 11 SOLUTION PDF HERE

Chapter 10

It is a substantial complement to Chapter 9. Still many open problems which are very interesting.

CHAPTER 10 SOLUTION PDF HERE

Chapter 9

Long chapter, short practices.

CHAPTER 9 SOLUTION PDF HERE

Chapter 8

Finished without programming. Plan on creating additional exercises to this Chapter because many materials are lack of practice.

CHAPTER 8 SOLUTION PDF HERE

Chapter 7

Finished without programming. Thanks for help from Zhiqi Pan.

CHAPTER 7 SOLUTION PDF HERE

Chapter 6

Fully finished.

CHAPTER 6 SOLUTION PDF HERE

Chapter 5

Partially finished.

CHAPTER 5 SOLUTION PDF HERE

Chapter 4

Finished. Ex4.7 Partially finished. Dat DP question will burn my mind and macbook but I encourage any one who cares nothing about that trying to do yourself. Running through it forces you remember everything behind ordinary DP.:)

CHAPTER 4 SOLUTION PDF HERE

Chapter 3 (I was in a rush in this chapter. Be aware about strange answers if any.)

CHAPTER 3 SOLUTION PDF HERE

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 713

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗