Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → marload → Deeprl Tensorflow2

marload / Deeprl Tensorflow2

Licence: apache-2.0

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning tensorflow reinforcement-learning deep-reinforcement-learning dqn ppo a3c ddpg trpo

Projects that are alternatives of or similar to Deeprl Tensorflow2

Deep-Reinforcement-Learning-With-Python

Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math

Stars: ✭ 222 (-30.41%)

Mutual labels: deep-reinforcement-learning, dqn, a3c, ddpg, trpo, ppo

Minimalrl

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Stars: ✭ 2,051 (+542.95%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, ppo, a3c, ddpg

Easy Rl

强化学习中文教程，在线阅读地址：https://datawhalechina.github.io/easy-rl/

Stars: ✭ 3,004 (+841.69%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, ppo, a3c, ddpg

Machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Stars: ✭ 145 (-54.55%)

Mutual labels: reinforcement-learning, dqn, ppo, a3c, ddpg

Torchrl

Pytorch Implementation of Reinforcement Learning Algorithms ( Soft Actor Critic(SAC)/ DDPG / TD3 /DQN / A2C/ PPO / TRPO)

Stars: ✭ 90 (-71.79%)

Mutual labels: reinforcement-learning, dqn, ppo, ddpg, trpo

Machine Learning Is All You Need

🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!

Stars: ✭ 173 (-45.77%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, ddpg, trpo

Deep Reinforcement Learning With Pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

Stars: ✭ 1,345 (+321.63%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, a3c, trpo

Deep Reinforcement Learning

Repo for the Deep Reinforcement Learning Nanodegree program

Stars: ✭ 4,012 (+1157.68%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, ppo, ddpg

Slm Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

Stars: ✭ 904 (+183.39%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, ppo, a3c

Elegantrl

Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch.

Stars: ✭ 575 (+80.25%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, ppo, ddpg

Mushroom Rl

Python library for Reinforcement Learning.

Stars: ✭ 442 (+38.56%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, ddpg, trpo

Pytorch Drl

PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.

Stars: ✭ 233 (-26.96%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, ppo, ddpg

Reinforcement Learning Algorithms

This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)

Stars: ✭ 426 (+33.54%)

Mutual labels: deep-reinforcement-learning, dqn, ppo, ddpg, trpo

Autonomous Learning Library

A PyTorch library for building deep reinforcement learning agents.

Stars: ✭ 425 (+33.23%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, ppo, ddpg

Pytorch Rl

Deep Reinforcement Learning with pytorch & visdom

Stars: ✭ 745 (+133.54%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, dqn, a3c, trpo

Reinforcement Learning With Tensorflow

Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学

Stars: ✭ 6,948 (+2078.06%)

Mutual labels: reinforcement-learning, dqn, ppo, a3c, ddpg

Run Skeleton Run

Reason8.ai PyTorch solution for NIPS RL 2017 challenge

Stars: ✭ 83 (-73.98%)

Mutual labels: reinforcement-learning, ppo, ddpg, trpo

Reinforcement learning

Reinforcement learning tutorials

Stars: ✭ 82 (-74.29%)

Mutual labels: reinforcement-learning, dqn, ppo, a3c

Torch Ac

Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO

Stars: ✭ 70 (-78.06%)

Mutual labels: reinforcement-learning, deep-reinforcement-learning, ppo, a3c

Rainy

☔ Deep RL agents with PyTorch☔

Stars: ✭ 39 (-87.77%)

Mutual labels: deep-reinforcement-learning, dqn, ddpg, ppo

View All Similar Projects ➔

Deep Reinforcement Learning in TensorFlow2

DeepRL-TensorFlow2 is a repository that implements a variety of popular Deep Reinforcement Learning algorithms using TensorFlow2. The key to this repository is an easy-to-understand code. Therefore, if you are a student or a researcher studying Deep Reinforcement Learning, I think it would be the best choice to study with this repository. One algorithm relies only on one python script file. So you don't have to go in and out of different files to study specific algorithms. This repository is constantly being updated and will continue to add a new Deep Reinforcement Learning algorithm.

DQN

Paper Playing Atari with Deep Reinforcement Learning
Author Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Idea

# idea01. Approximate Q-Function using NeuralNetwork
def create_model(self):
    model = tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(16, activation='relu'),
        Dense(self.action_dim)
    ])
    model.compile(loss='mse', optimizer=Adam(args.lr))
    return model

# idea02. Use target network
self.target_model = ActionStateModel(self.state_dim, self.action_dim)
 
# idea03. Use ReplayBuffer to increase data efficiency
class ReplayBuffer:
    def __init__(self, capacity=10000):
        self.buffer = deque(maxlen=capacity)
    
    def put(self, state, action, reward, next_state, done):
        self.buffer.append([state, action, reward, next_state, done])
    
    def sample(self):
        sample = random.sample(self.buffer, args.batch_size)
        states, actions, rewards, next_states, done = map(np.asarray, zip(*sample))
        states = np.array(states).reshape(args.batch_size, -1)
        next_states = np.array(next_states).reshape(args.batch_size, -1)
        return states, actions, rewards, next_states, done
    
    def size(self):
        return len(self.buffer)

Getting Start

# Discrete Action Space Deep Q-Learning
$ python DQN/DQN_Discrete.py

DRQN

Paper Deep Recurrent Q-Learning for Partially Observable MDPs
Author Matthew Hausknecht, Peter Stone
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Previous state uses LSTM layer as feature
def create_model(self):
    return tf.keras.Sequential([
        Input((args.time_steps, self.state_dim)),
        LSTM(32, activation='tanh'),
        Dense(16, activation='relu'),
        Dense(self.action_dim)
    ])

Getting Start

# Discrete Action Space Deep Recurrent Q-Learning
$ python DRQN/DRQN_Discrete.py

DoubleDQN

Paper Deep Reinforcement Learning with Double Q-learning
Author Hado van Hasselt, Arthur Guez, David Silver
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Resolved the issue of 'overestimate' in Q Learning
on_action = np.argmax(self.model.predict(next_states), axis=1)
next_q_values = self.target_model.predict(next_states)[range(args.batch_size), on_action]
targets[range(args.batch_size), actions] = rewards + (1-done) * next_q_values * args.gamma

Getting Start

# Discrete Action Space Double Deep Q-Learning
$ python DoubleQN/DoubleDQN_Discrete.py

DuelingDQN

Paper Dueling Network Architectures for Deep Reinforcement Learning
Author Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete only

Core of Ideas

# idea01. Q-Function has been separated into Value Function and Advantage Function
def create_model(self):
    backbone = tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(16, activation='relu')
    ])
    state_input = Input((self.state_dim,))
    backbone_1 = Dense(32, activation='relu')(state_input)
    backbone_2 = Dense(16, activation='relu')(backbone_1)
    value_output = Dense(1)(backbone_2)
    advantage_output = Dense(self.action_dim)(backbone_2)
    output = Add()([value_output, advantage_output])
    model = tf.keras.Model(state_input, output)
    model.compile(loss='mse', optimizer=Adam(args.lr))
    return model

Gettting Start

# Discrete Action Space Dueling Deep Q-Learning
$ python DuelingDQN/DuelingDQN_Discrete.py

A2C

Paper Actor-Critic Algorithms
Author Vijay R. Konda, John N. Tsitsiklis
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of Ideas

# idea01. Use Advantage to reduce Variance
def advatnage(self, td_targets, baselines):
    return td_targets - baselines

Getting Start

# Discrete Action Space Advantage Actor-Critic
$ python A2C/A2C_Discrete.py

# Continuous Action Space Advantage Actor-Critic
$ python A2C/A2C_Continuous.py

A3C

Paper Asynchronous Methods for Deep Reinforcement Learning
Author Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of Ideas

# idea01. Reduce the correlation of data by running asynchronously multiple workers
def train(self, max_episodes=1000):
    workers = []

    for i in range(self.num_workers):
        env = gym.make(self.env_name)
        workers.append(WorkerAgent(
            env, self.global_actor, self.global_critic, max_episodes))

    for worker in workers:
        worker.start()

    for worker in workers:
        worker.join()

# idea02. Improves exploration through entropy loss
entropy_loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

Getting Start

# Discrete Action Space Asyncronous Advantage Actor-Critic
$ python A3C/A3C_Discrete.py

# Continuous Action Space Asyncronous Advantage Actor-Critic
$ python A3C/A3C_Continuous.py

PPO

Paper Proximal Policy Optimization
Author John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Method ON-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

Core of ideas

# idea01. Use Importance Sampling to act like an Off-Policy algorithm
# idea02. Use clip to prevent rapid changes in parameters.
def compute_loss(self, old_policy, new_policy, actions, gaes):
    gaes = tf.stop_gradient(gaes)
    old_log_p = tf.math.log(
        tf.reduce_sum(old_policy * actions))
    old_log_p = tf.stop_gradient(old_log_p)
    log_p = tf.math.log(tf.reduce_sum(
        new_policy * actions))
    ratio = tf.math.exp(log_p - old_log_p)
    clipped_ratio = tf.clip_by_value(
        ratio, 1 - args.clip_ratio, 1 + args.clip_ratio)
    surrogate = -tf.minimum(ratio * gaes, clipped_ratio * gaes)
    return tf.reduce_mean(surrogate)

Getting Start

# Discrete Action Space Proximal Policy Optimization
$ python PPO/PPO_Discrete.py

# Continuous Action Space Proximal Policy Optimization
$ python PPO/PPO_Continuous.py

DDPG

Paper Continuous control with deep reinforcement learning
Author Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Continuous

Core of ideas

# idea01. Use deterministic Actor Model
def create_model(self):
    return tf.keras.Sequential([
        Input((self.state_dim,)),
        Dense(32, activation='relu'),
        Dense(32, activation='relu'),
        Dense(self.action_dim, activation='tanh'),
        Lambda(lambda x: x * self.action_bound)
    ])

# idea02. Add noise to Action
action = np.clip(action + noise, -self.action_bound, self.action_bound)

Getting Start

# Continuous Action Space Proximal Policy Optimization
$ python DDPG/DDPG_Continuous.py

TRPO

Paper Trust Region Policy Optimization
Author John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

# NOTE: Not yet implemented!

TD3

Paper Addressing Function Approximation Error in Actor-Critic Methods
Author Scott Fujimoto, Herke van Hoof, David Meger
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Continuous

# NOTE: Not yet implemented!

SAC

Paper Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Author Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine
Method OFF-Policy / Temporal-Diffrence / Model-Free
Action Discrete, Continuous

# NOTE: Not yet implemented!

Reference

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 319

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗