All Projects → ZhiqingXiao → Rl Book

ZhiqingXiao / Rl Book

Source codes for the book "Reinforcement Learning: Theory and Python Implementation"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rl Book

Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (-62.5%)
Mutual labels:  gym, jupyter-notebook, reinforcement-learning, deep-reinforcement-learning
Pytorch Rl
This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch
Stars: ✭ 394 (-15.09%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, openai-gym
Reinforcementlearning Atarigame
Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games
Stars: ✭ 118 (-74.57%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, openai-gym
Deep Reinforcement Learning
Repo for the Deep Reinforcement Learning Nanodegree program
Stars: ✭ 4,012 (+764.66%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, openai-gym
Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (+37.93%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning, openai-gym
Deterministic Gail Pytorch
PyTorch implementation of Deterministic Generative Adversarial Imitation Learning (GAIL) for Off Policy learning
Stars: ✭ 44 (-90.52%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning, openai-gym
Drq
DrQ: Data regularized Q
Stars: ✭ 268 (-42.24%)
Mutual labels:  gym, jupyter-notebook, reinforcement-learning, deep-reinforcement-learning
Lagom
lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
Stars: ✭ 364 (-21.55%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning
Dmc2gym
OpenAI Gym wrapper for the DeepMind Control Suite
Stars: ✭ 75 (-83.84%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Stable Baselines
Mirror of Stable-Baselines: a fork of OpenAI Baselines, implementations of reinforcement learning algorithms
Stars: ✭ 115 (-75.22%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Naf Tensorflow
"Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow
Stars: ✭ 192 (-58.62%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Ma Gym
A collection of multi agent environments based on OpenAI gym.
Stars: ✭ 226 (-51.29%)
Mutual labels:  gym, reinforcement-learning, openai-gym
Gym Gazebo2
gym-gazebo2 is a toolkit for developing and comparing reinforcement learning algorithms using ROS 2 and Gazebo
Stars: ✭ 257 (-44.61%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Pytorch sac ae
PyTorch implementation of Soft Actor-Critic + Autoencoder(SAC+AE)
Stars: ✭ 94 (-79.74%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Rlenv.directory
Explore and find reinforcement learning environments in a list of 150+ open source environments.
Stars: ✭ 79 (-82.97%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Muzero General
MuZero
Stars: ✭ 1,187 (+155.82%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning
Mushroom Rl
Python library for Reinforcement Learning.
Stars: ✭ 442 (-4.74%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning, openai-gym
Reinforcement learning tutorial with demo
Reinforcement Learning Tutorial with Demo: DP (Policy and Value Iteration), Monte Carlo, TD Learning (SARSA, QLearning), Function Approximation, Policy Gradient, DQN, Imitation, Meta Learning, Papers, Courses, etc..
Stars: ✭ 442 (-4.74%)
Mutual labels:  jupyter-notebook, reinforcement-learning, deep-reinforcement-learning
Rl Portfolio Management
Attempting to replicate "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem" https://arxiv.org/abs/1706.10059 (and an openai gym environment)
Stars: ✭ 447 (-3.66%)
Mutual labels:  jupyter-notebook, deep-reinforcement-learning, openai-gym
Drlkit
A High Level Python Deep Reinforcement Learning library. Great for beginners, prototyping and quickly comparing algorithms
Stars: ✭ 29 (-93.75%)
Mutual labels:  gym, reinforcement-learning, deep-reinforcement-learning

强化学习:原理与Python实现

全球第一本配套 TensorFlow 2 代码的强化学习教程书

中国第一本配套 TensorFlow 2 代码的纸质算法书

现已提供 TensorFlow 2 和 PyTorch 1 对照代码

Book

本书介绍强化学习理论及其 Python 实现。

  • 理论完备:全书用一套完整的数学体系,严谨地讲授强化学习的理论基础,主要定理均给出证明过程。各章内容循序渐进,覆盖了所有主流强化学习算法,包括资格迹等非深度强化学习算法和柔性执行者/评论者等深度强化学习算法。
  • 案例丰富:在您最爱的操作系统(包括 Windows、macOS、Linux)上,基于 Python 3.8(兼容 Python 3.9)、Gym 0.18 和 TensorFlow 2.4 / PyTorch 1.8,实现强化学习算法。全书实现统一规范,体积小、重量轻。第 1~9 章给出了算法的配套实现,环境部分只依赖于 Gym 的最小安装,在没有 GPU 的计算机上也可运行;第 10~12 章介绍了多个热门综合案例,涵盖 Gym 的完整安装和自定义扩展,在有普通 GPU 的计算机上即可运行。

2020年更新

本书深度强化学习部分新增 基于 TensorFlow 2 和 PyTorch 1 的算法对照实现。 两个版本实现均和正文伪代码严格对应,两个版本仅在智能体部分实现不同,程序结构和智能体参数完全相同。

目录

  1. 初识强化学习   查看代码:useGym
  2. Markov决策过程   查看代码:useBellman CliffWalking
  3. 有模型数值迭代   查看代码:FrozenLake
  4. 回合更新价值迭代   查看代码:Blackjack
  5. 时序差分价值迭代   查看代码:Taxi
  6. 函数近似方法   查看代码:MountainCar
  7. 回合更新策略梯度方法   查看代码:CartPole
  8. 执行者/评论者方法   查看代码:Acrobot
  9. 连续动作空间的确定性策略   查看代码:Pendulum
  10. 综合案例:电动游戏   查看代码:Breakout Pong Seaquest
  11. 综合案例:棋盘游戏   查看代码:TicTacToe Reversi boardgame2
  12. 综合案例:自动驾驶   查看代码:AirSimNH

QQ群

  • 主群:935702193(主群扩容中请多支持,勘误报错可发此群,其他问题提问前请先Google)
  • 二群:243613392(免费入群,勘误报错可发此群,其他问题提问前请先Google,群主和管理员不提供免费咨询服务)
  • 多任务群:696984257(免费入群,非小白群,多任务强化学习+强化元学习+终身强化学习+迁移强化学习,勘误报错勿发此群,提问前请先Google)
  • 关于入群验证问题:由于QQ的bug,即使正确输入答案,也可能会验证失败。这时更换设备重试、更换输入法重试、改日重试均可能解决问题。如果答案中有英文字母,清注意大小写。

书籍勘误与更新

判断纸质版书籍版次的方法 / 确定纸质书印刷时间的方法

  • “前言”之前有1页是“图书在版编目(CIP)数据”。这页下部的表格中有一项是“版次”,该项标明当前书是什么时候第几次印刷的。

本书数学符号表

本书电子版

本书不仅有纸质版销售,也有电子版销售。不过,电子版没有提供配套的勘误与更新资源,而且公式展示不美观,对阅读带来困难。所以推荐购买纸质版。电子版销售平台包括但不限于:

热心读者 Anesck 对本书知识点的梳理评注

第1章 第2章 第3章 第4章 第5章 第6章 第7章 第8章 第9章

初学者常见问题

  • 问:Windows系统下安装TensorFlow或PyTorch失败。答:请在Windows 10里安装Visual Studio 2019(如果有旧版本的Visual Studio请先彻底卸载)。更多细节和安装问题请自行Google。

  • 问:在Visual Studio或Visual Studio Code或PyCharm里面运行代码失败,比如找不到函数display()。答:本repo代码是配套Jupyter Notebook环境的,只能在Jupyter Notebook里运行。推荐您安装最新版本的Anaconda并直接运行下载来的Notebook。(display()函数是Jupyter Notebook里才有的函数。)不需要安装Visual Studio Code或PyCharm。更多细节或其他错误请自行Google。

  • 问:GPU运行的结果和repo里带的结果不完全一样。答:本repo附带的结果都是用CPU跑的。GPU运算本来就不能精确复现。更多细节请自行Google。

Reinforcement Learning: Theory and Python Implementation

The First Reinforcement Learning Tutorial Book with TensorFlow 2 Implementation

Codes with both TensorFlow 2 and PyTorch 1

This is a tutorial book on reinforcement learning, with explanation of theory and Python implementation.

  • Theory: Starting from a uniform mathematical framework, this book derives the theory and algorithms of reinforcement learning, including all major algorithms such as eligibility traces and soft actor-critic algorithms.
  • Practice: Every chapter is accompanied by high quality implementation based on Python 3.8, Gym 0.18, and TensorFlow 2.4 / PyTorch 1.8.

Please email me if you are interested in publishing this book in other languages.

Table of Codes

Chapter Environment Agent
2 CliffWalking-v0 Bellman
3 FrozenLake-v0 DP
4 Blackjack-v0 MC
5 Taxi-v3 SARSA, ExpectedSARSA, QL, DoubleQL, SARSA(λ)
6 MountainCar-v0 SARSA, SARSA(λ), DQN tf torch, DoubleDQN tf torch, DuelDQN tf torch
7 CartPole-0 VPG tf torch, VPGwBaseline tf torch, OffPolicyVPG tf torch, OffPolicyVPGwBaseline tf torch
8 Acrobot-v1 QAC tf torch, AdvantageAC tf torch, EligibilityTraceAC tf torch, PPO tf torch, NPG tf torch, TRPO tf torch, OffPAC tf torch
9 Pendulum-v0 DDPG tf torch, TD3 tf torch
10 LunarLander-v2 SQL tf torch, SAC tf torch, SACwA tf torch

Table of Contents

Detail

  1. Introduction of Reinforcement Learning
  2. Markov Decision Process
  3. Model-based Numeric Iteration
  4. Monte-Carlo Learning
  5. Temporal Difference Learning
  6. Function Approximation
  7. Policy Gradient
  8. Actor-Critic
  9. Deterministic Policy Gradient
  10. Case Study: Video Game
  11. Case Study: Board Game
  12. Case Study: Autonomous Driving

BibTeX

@book{xiao2019,
 title     = {Reinforcement Learning: Theory and {Python} Implementation},
 author    = {Zhiqing Xiao}
 year      = 2019,
 month     = 8,
 publisher = {China Machine Press},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].