Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → sudharsan13296 → Deep-Reinforcement-Learning-With-Python

sudharsan13296 / Deep-Reinforcement-Learning-With-Python

Licence: other

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Programming Languages

Jupyter Notebook

11667 projects

Labels

reinforcement-learning deep-learning deep-reinforcement-learning openai-gym q-learning dqn policy-gradient a3c ddpg sac inverse-reinforcement-learning actor-critic bellman-equation double-dqn trpo c51 ppo a2c td3

Projects that are alternatives of or similar to Deep-Reinforcement-Learning-With-Python

Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Stars: ✭ 113 (-49.1%)

Mutual labels: q-learning, dqn, policy-gradient, ddpg, sac, actor-critic, double-dqn, c51, td3

An elegant PyTorch deep reinforcement learning library.

Stars: ✭ 4,109 (+1750.9%)

Mutual labels: dqn, policy-gradient, ddpg, sac, double-dqn, trpo, ppo, a2c, td3

Reinforcement Learning With Tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Stars: ✭ 6,948 (+3029.73%)

Mutual labels: q-learning, dqn, policy-gradient, a3c, ddpg, actor-critic, double-dqn, ppo

Modularized Implementation of Deep RL Algorithms in PyTorch

Stars: ✭ 2,640 (+1089.19%)

Mutual labels: deep-reinforcement-learning, dqn, ddpg, double-dqn, ppo, a2c, td3

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+1253.15%)

Mutual labels: deep-reinforcement-learning, q-learning, dqn, policy-gradient, a3c, ddpg, ppo

☔ Deep RL agents with PyTorch☔

Stars: ✭ 39 (-82.43%)

Mutual labels: deep-reinforcement-learning, dqn, ddpg, sac, ppo, a2c, td3

ReinforcementLearningZoo.jl

juliareinforcementlearning.org/

Stars: ✭ 46 (-79.28%)

Mutual labels: dqn, ddpg, sac, c51, ppo, a2c, td3

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Stars: ✭ 2,051 (+823.87%)

Mutual labels: deep-reinforcement-learning, dqn, a3c, ddpg, sac, ppo, a2c

Deep Reinforcement Learning With Pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

Stars: ✭ 1,345 (+505.86%)

Mutual labels: deep-reinforcement-learning, dqn, policy-gradient, a3c, actor-critic, trpo, ppo

Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥

Stars: ✭ 2,074 (+834.23%)

Mutual labels: dqn, ddpg, sac, ppo, a2c, td3

Deeprl Tensorflow2

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

Stars: ✭ 319 (+43.69%)

Mutual labels: deep-reinforcement-learning, dqn, a3c, ddpg, trpo, ppo

Lightweight deep RL Libraray for continuous control.

Stars: ✭ 14 (-93.69%)

Mutual labels: deep-reinforcement-learning, policy-gradient, ddpg, sac, ppo, td3

Reinforcement Learning Algorithms

This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)

Stars: ✭ 426 (+91.89%)

Mutual labels: deep-reinforcement-learning, dqn, ddpg, actor-critic, trpo, ppo

Machine Learning Is All You Need

🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!

Stars: ✭ 173 (-22.07%)

Mutual labels: deep-reinforcement-learning, dqn, ddpg, actor-critic, trpo, ppo

Mxnet implementation of Deep Reinforcement Learning papers, such as DQN, PG, DDPG, PPO

Stars: ✭ 26 (-88.29%)

Mutual labels: deep-reinforcement-learning, dqn, policy-gradient, ddpg, a2c, td3

Hands On Reinforcement Learning With Python

Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow

Stars: ✭ 640 (+188.29%)

Mutual labels: deep-reinforcement-learning, openai-gym, q-learning, policy-gradient, trpo, ppo

Explorer is a PyTorch reinforcement learning framework for exploring new ideas.

Stars: ✭ 54 (-75.68%)

Mutual labels: deep-reinforcement-learning, q-learning, dqn, policy-gradient, actor-critic, ppo

rl implementations

No description or website provided.

Stars: ✭ 40 (-81.98%)

Mutual labels: deep-reinforcement-learning, dqn, policy-gradient, ddpg, actor-critic, a2c

Deep Reinforcement Learning

Repo for the Deep Reinforcement Learning Nanodegree program

Stars: ✭ 4,012 (+1707.21%)

Mutual labels: deep-reinforcement-learning, openai-gym, dqn, ddpg, ppo

This repository contains model-free deep reinforcement learning algorithms implemented in Pytorch

Stars: ✭ 394 (+77.48%)

Mutual labels: deep-reinforcement-learning, openai-gym, dqn, policy-gradient, ddpg

View All Similar Projects ➔

Deep Reinforcement Learning With Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

About the book

Book Cover

With significant enhancement in the quality and quantity of algorithms in recent years, this second edition of Hands-On Reinforcement Learning with Python has been completely revamped into an example-rich guide to learning state-of-the-art reinforcement learning (RL) and deep RL algorithms with TensorFlow and the OpenAI Gym toolkit.

In addition to exploring RL basics and foundational concepts such as the Bellman equation, Markov decision processes, and dynamic programming, this second edition dives deep into the full spectrum of value-based, policy-based, and actor- critic RL methods with detailed math. It explores state-of-the-art algorithms such as DQN, TRPO, PPO and ACKTR, DDPG, TD3, and SAC in depth, demystifying the underlying math and demonstrating implementations through simple code examples.

The book has several new chapters dedicated to new RL techniques including distributional RL, imitation learning, inverse RL, and meta RL. You will learn to leverage Stable Baselines, an improvement of OpenAI's baseline library, to implement popular RL algorithms effortlessly. The book concludes with an overview of promising approaches such as meta-learning and imagination augmented agents in research.

Get the book

Oreilly Safari

Amazon

Packt

Google Books

Google Play

Table of Contents

Download the detailed and complete table of contents from here.

1. Fundamentals of Reinforcement Learning

2. A Guide to the Gym Toolkit

2.1. Setting Up our Machine
2.2. Creating our First Gym Environment
2.3. Generating an episode
2.4. Classic Control Environments
2.5. Cart Pole Balancing with Random Policy
2.6. Atari Game Environments
2.7. Agent Playing the Tennis Game
2.8. Recording the Game
2.9. Other environments
2.10. Environment Synopsis

3. Bellman Equation and Dynamic Programming

3.1. The Bellman Equation
3.2. Bellman Optimality Equation
3.3. Relation Between Value and Q Function
3.4. Dynamic Programming
3.5. Value Iteration
3.6. Solving the Frozen Lake Problem with Value Iteration
3.7. Policy iteration
3.8. Solving the Frozen Lake Problem with Policy Iteration
3.9. Is DP Applicable to all Environments?

4. Monte Carlo Methods

4.1. Understanding the Monte Carlo Method
4.2. Prediction and Control Tasks
4.3. Monte Carlo Prediction
4.4. Understanding the BlackJack Game
4.5. Every-visit MC Prediction with Blackjack Game
4.6. First-visit MC Prediction with Blackjack Game
4.7. Incremental Mean Updates
4.8. MC Prediction (Q Function)
4.9. Monte Carlo Control
4.10. On-Policy Monte Carlo Control
4.11. Monte Carlo Exploring Starts
4.12. Monte Carlo with Epsilon-Greedy Policy
4.13. Implementing On-Policy MC Control
4.14. Off-Policy Monte Carlo Control
4.15. Is MC Method Applicable to all Tasks?

5. Understanding Temporal Difference Learning

5.1. TD Learning
5.2. TD Prediction
5.3. Predicting the Value of States in a Frozen Lake Environment
5.4. TD Control
5.5. On-Policy TD Control - SARSA
5.6. Computing Optimal Policy using SARSA
5.7. Off-Policy TD Control - Q Learning
5.8. Computing the Optimal Policy using Q Learning
5.9. The Difference Between Q Learning and SARSA
5.10. Comparing DP, MC, and TD Methods

6. Case Study: The MAB Problem

6.1. The MAB Problem
6.2. Creating Bandit in the Gym
6.3. Epsilon-Greedy
6.4. Implementing Epsilon-Greedy
6.5. Softmax Exploration
6.6. Implementing Softmax Exploration
6.7. Upper Confidence Bound
6.8. Implementing UCB
6.9. Thompson Sampling
6.10. Implementing Thompson Sampling
6.11. Applications of MAB
6.12. Finding the Best Advertisement Banner using Bandits
6.13. Contextual Bandits

7. Deep Learning Foundations

7.1. Biological and artifical neurons
7.2. ANN and its layers
7.3. Exploring activation functions
7.4. Forward and backward propgation in ANN
7.5. Building neural network from scratch
7.6. Recurrent neural networks
7.7. LSTM-RNN
7.8. Convolutional neural networks
7.9. Generative adversarial networks

8. Getting to Know TensorFlow

8.1. What is TensorFlow?
8.2. Understanding Computational Graphs and Sessions
8.3. Variables, Constants, and Placeholders
8.4. Introducing TensorBoard
8.5. Handwritten digits classification using Tensorflow
8.6. Visualizing Computational graph in TensorBord
8.7. Introducing Eager execution
8.8. Math operations in TensorFlow
8.9. Tensorflow 2.0 and Keras
8.10. MNIST digits classification in Tensorflow 2.0

9. Deep Q Network and its Variants

9.1. What is Deep Q Network?
9.2. Understanding DQN
9.3. Playing Atari Games using DQN
9.4. Double DQN
9.5. DQN with Prioritized Experience Replay
9.6. Dueling DQN
9.7. Deep Recurrent Q Network

10. Policy Gradient Method

10.1. Why Policy Based Methods?
10.2. Policy Gradient Intuition
10.3. Understanding the Policy Gradient
10.4. Deriving Policy Gradien
10.5. Variance Reduction Methods
10.6. Policy Gradient with Reward-to-go
10.7. Cart Pole Balancing with Policy Gradient
10.8. Policy Gradient with Baseline

11. Actor Critic Methods - A2C and A3C

11.1. Overview of Actor Critic Method
11.2. Understanding the Actor Critic Method
11.3. Advantage Actor Critic
11.4. Asynchronous Advantage Actor Critic
11.5. Mountain Car Climbing using A3C
11.6. A2C Revisited

12. Learning DDPG, TD3 and SAC

12.1. Deep Deterministic Policy Gradient
12.2. Components of DDPG
12.3. Putting it all together
12.4. Algorithm - DDPG
12.5. Swinging Up the Pendulum using DDPG
12.6. Twin Delayed DDPG
12.7. Components of TD3
12.8. Putting it all together
12.9. Algorithm - TD3
12.10. Soft Actor Critic
12.11. Components of SAC
12.12. Putting it all together
12.13. Algorithm - SAC

13. TRPO, PPO and ACKTR Methods

13.1 Trust Region Policy Optimization
13.2. Math Essentials
13.3. Designing the TRPO Objective Function
13.4. Solving the TRPO Objective Function
13.5. Algorithm - TRPO
13.6. Proximal Policy Optimization
13.7. PPO with Clipped Objective
13.9. Implementing PPO-Clipped Method
13.10. PPO with Penalized Objective
13.11. Actor Critic using Kronecker Factored Trust Region
13.12. Math Essentials
13.13. Kronecker-Factored Approximate Curvature (K-FAC)
13.14. K-FAC in Actor Critic

14. Distributional Reinforcement Learning

14.1. Why Distributional Reinforcement Learning?
14.2. Categorical DQN
14.3. Playing Atari games using Categorical DQN
14.4. Quantile Regression DQN
14.5. Math Essentials
14.6. Understanding QR-DQN
14.7. Distributed Distributional DDPG

15. Imitation Learning and Inverse RL

15.1. Supervised Imitation Learning
15.2. DAgger
15.3. Deep Q learning from Demonstrations
15.4. Inverse Reinforcement Learning
15.5. Maximum Entropy IRL
15.6. Generative Adversarial Imitation Learning

16. Deep Reinforcement Learning with Stable Baselines

16.1. Creating our First Agent with Baseline
16.2. Multiprocessing with Vectorized Environments
16.3. Integrating the Custom Environments
16.4. Playing Atari Games with DQN
16.5. Implememt DQN variants
16.6. Lunar Lander using A2C
16.7. Creating a custom network
16.8. Swinging up a Pendulum using DDPG
16.9. Training an Agent to Walk using TRPO
16.10. Training Cheetah Bot to Run using PPO

17. Reinforcement Learning Frontiers

17.1. Meta Reinforcement Learning
17.2. Model Agnostic Meta Learning
17.3. Understanding MAML
17.4. MAML in the Supervised Learning Setting
17.5. Algorithm - MAML in Supervised Learning
17.6. MAML in the Reinforcement Learning Setting
17.7. Algorithm - MAML in Reinforcement Learning
17.8. Hierarchical Reinforcement Learning
17.9. MAXQ Value Function Decomposition
17.10. Imagination Augmented Agents

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 222

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗