All Projects → ShangtongZhang → Reinforcement Learning An Introduction

ShangtongZhang / Reinforcement Learning An Introduction

Licence: mit
Python Implementation of Reinforcement Learning: An Introduction

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Reinforcement Learning An Introduction

Learning2run
Our NIPS 2017: Learning to Run source code
Stars: ✭ 57 (-99.48%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Snake
Artificial intelligence for the Snake game.
Stars: ✭ 1,241 (-88.76%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Data Science Best Resources
Carefully curated resource links for data science in one place
Stars: ✭ 1,104 (-90%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Machine Learning From Scratch
Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.
Stars: ✭ 42 (-99.62%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Papers Literature Ml Dl Rl Ai
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
Stars: ✭ 1,341 (-87.86%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Deep traffic
MIT DeepTraffic top 2% solution (75.01 mph) 🚗.
Stars: ✭ 47 (-99.57%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Ai Reading Materials
Some of the ML and DL related reading materials, research papers that I've read
Stars: ✭ 79 (-99.28%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Gym
Seoul AI Gym is a toolkit for developing AI algorithms.
Stars: ✭ 27 (-99.76%)
Mutual labels:  artificial-intelligence, reinforcement-learning
60 days rl challenge
60_Days_RL_Challenge中文版
Stars: ✭ 92 (-99.17%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Mapleai
AI各领域学习资料整理。(A collection of all skills and knowledges should be got command of to obtain an AI relevant job offer. There are online blogs, my personal blogs, electronic books copy.)
Stars: ✭ 89 (-99.19%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Awesome Mlss
List of summer schools in machine learning + related fields across the globe
Stars: ✭ 1,001 (-90.93%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Chemgan Challenge
Code for the paper: Benhenda, M. 2017. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227.
Stars: ✭ 98 (-99.11%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Artificialintelligenceengines
Computer code collated for use with Artificial Intelligence Engines book by JV Stone
Stars: ✭ 35 (-99.68%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Notebooks
Some notebooks
Stars: ✭ 53 (-99.52%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Batch Ppo
Efficient Batched Reinforcement Learning in TensorFlow
Stars: ✭ 945 (-91.44%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Awesome Decision Making Reinforcement Learning
A selection of state-of-the-art research materials on decision making and motion planning.
Stars: ✭ 68 (-99.38%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Awesome Ai Books
Some awesome AI related books and pdfs for learning and downloading, also apply some playground models for learning
Stars: ✭ 855 (-92.26%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Doyouevenlearn
Essential Guide to keep up with AI/ML/DL/CV
Stars: ✭ 913 (-91.73%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Simulator
A ROS/ROS2 Multi-robot Simulator for Autonomous Vehicles
Stars: ✭ 1,260 (-88.59%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Rlai Exercises
Exercise Solutions for Reinforcement Learning: An Introduction [2nd Edition]
Stars: ✭ 97 (-99.12%)
Mutual labels:  artificial-intelligence, reinforcement-learning

Reinforcement Learning: An Introduction

Build Status

Python replication for Sutton & Barto's book Reinforcement Learning: An Introduction (2nd Edition)

If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly, and unfortunately I do not have exercise answers for the book.

Contents

Chapter 1

  1. Tic-Tac-Toe

Chapter 2

  1. Figure 2.1: An exemplary bandit problem from the 10-armed testbed
  2. Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed
  3. Figure 2.3: Optimistic initial action-value estimates
  4. Figure 2.4: Average performance of UCB action selection on the 10-armed testbed
  5. Figure 2.5: Average performance of the gradient bandit algorithm
  6. Figure 2.6: A parameter study of the various bandit algorithms

Chapter 3

  1. Figure 3.2: Grid example with random policy
  2. Figure 3.5: Optimal solutions to the gridworld example

Chapter 4

  1. Figure 4.1: Convergence of iterative policy evaluation on a small gridworld
  2. Figure 4.2: Jack’s car rental problem
  3. Figure 4.3: The solution to the gambler’s problem

Chapter 5

  1. Figure 5.1: Approximate state-value functions for the blackjack policy
  2. Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES
  3. Figure 5.3: Weighted importance sampling
  4. Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates

Chapter 6

  1. Example 6.2: Random walk
  2. Figure 6.2: Batch updating
  3. Figure 6.3: Sarsa applied to windy grid world
  4. Figure 6.4: The cliff-walking task
  5. Figure 6.6: Interim and asymptotic performance of TD control methods
  6. Figure 6.7: Comparison of Q-learning and Double Q-learning

Chapter 7

  1. Figure 7.2: Performance of n-step TD methods on 19-state random walk

Chapter 8

  1. Figure 8.2: Average learning curves for Dyna-Q agents varying in their number of planning steps
  2. Figure 8.4: Average performance of Dyna agents on a blocking task
  3. Figure 8.5: Average performance of Dyna agents on a shortcut task
  4. Example 8.4: Prioritized sweeping significantly shortens learning time on the Dyna maze task
  5. Figure 8.7: Comparison of efficiency of expected and sample updates
  6. Figure 8.8: Relative efficiency of different update distributions

Chapter 9

  1. Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task
  2. Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task
  3. Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task
  4. Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy
  5. Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task

Chapter 10

  1. Figure 10.1: The cost-to-go function for Mountain Car task in one run
  2. Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task
  3. Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task
  4. Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa
  5. Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task

Chapter 11

  1. Figure 11.2: Baird's Counterexample
  2. Figure 11.6: The behavior of the TDC algorithm on Baird’s counterexample
  3. Figure 11.7: The behavior of the ETD algorithm in expectation on Baird’s counterexample

Chapter 12

  1. Figure 12.3: Off-line λ-return algorithm on 19-state random walk
  2. Figure 12.6: TD(λ) algorithm on 19-state random walk
  3. Figure 12.8: True online TD(λ) algorithm on 19-state random walk
  4. Figure 12.10: Sarsa(λ) with replacing traces on Mountain Car
  5. Figure 12.11: Summary comparison of Sarsa(λ) algorithms on Mountain Car

Chapter 13

  1. Example 13.1: Short corridor with switched actions
  2. Figure 13.1: REINFORCE on the short-corridor grid world
  3. Figure 13.2: REINFORCE with baseline on the short-corridor grid-world

Environment

Usage

All files are self-contained

python any_file_you_want.py

Contribution

If you want to contribute some missing examples or fix some bugs, feel free to open an issue or make a pull request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].