All Projects → Rowing0914 → TF_RL

Rowing0914 / TF_RL

Licence: MIT license
Eagerly Experimentable!!!

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to TF RL

ultimate-volleyball
3D RL Volleyball environment built on Unity ML-Agents
Stars: ✭ 60 (+172.73%)
Mutual labels:  deep-reinforcement-learning
Pytorch-PCGrad
Pytorch reimplementation for "Gradient Surgery for Multi-Task Learning"
Stars: ✭ 179 (+713.64%)
Mutual labels:  deep-reinforcement-learning
Master-Thesis
Deep Reinforcement Learning in Autonomous Driving: the A3C algorithm used to make a car learn to drive in TORCS; Python 3.5, Tensorflow, tensorboard, numpy, gym-torcs, ubuntu, latex
Stars: ✭ 33 (+50%)
Mutual labels:  deep-reinforcement-learning
reinforcement learning ppo rnd
Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch with some explanation
Stars: ✭ 33 (+50%)
Mutual labels:  deep-reinforcement-learning
digit recognizer
CNN digit recognizer implemented in Keras Notebook, Kaggle/MNIST (0.995).
Stars: ✭ 27 (+22.73%)
Mutual labels:  google-colab
deeprl-continuous-control
Learning Continuous Control in Deep Reinforcement Learning
Stars: ✭ 14 (-36.36%)
Mutual labels:  deep-reinforcement-learning
deep rl acrobot
TensorFlow A2C to solve Acrobot, with synchronized parallel environments
Stars: ✭ 32 (+45.45%)
Mutual labels:  deep-reinforcement-learning
decentralized-rl
Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)
Stars: ✭ 40 (+81.82%)
Mutual labels:  deep-reinforcement-learning
interp-e2e-driving
Interpretable End-to-end Urban Autonomous Driving with Latent Deep Reinforcement Learning
Stars: ✭ 159 (+622.73%)
Mutual labels:  deep-reinforcement-learning
motion-planner-reinforcement-learning
End to end motion planner using Deep Deterministic Policy Gradient (DDPG) in gazebo
Stars: ✭ 99 (+350%)
Mutual labels:  deep-reinforcement-learning
Super-Meta-MarIO
Mario AI Ensemble
Stars: ✭ 15 (-31.82%)
Mutual labels:  deep-reinforcement-learning
alphastone
Using self-play, MCTS, and a deep neural network to create a hearthstone ai player
Stars: ✭ 24 (+9.09%)
Mutual labels:  deep-reinforcement-learning
pytorch-noreward-rl
pytorch implementation of Curiosity-driven Exploration by Self-supervised Prediction
Stars: ✭ 79 (+259.09%)
Mutual labels:  deep-reinforcement-learning
DRL in CV
A course on Deep Reinforcement Learning in Computer Vision. Visit Website:
Stars: ✭ 59 (+168.18%)
Mutual labels:  deep-reinforcement-learning
Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020
Live Trading. Please star.
Stars: ✭ 1,251 (+5586.36%)
Mutual labels:  deep-reinforcement-learning
DeepCubeA
Code for DeepCubeA, a Deep Reinforcement Learning algorithm that can learn to solve the Rubik's cube.
Stars: ✭ 92 (+318.18%)
Mutual labels:  deep-reinforcement-learning
Ramudroid
Ramudroid, autonomous solar-powered robot to clean roads, realtime object detection and webrtc based streaming
Stars: ✭ 22 (+0%)
Mutual labels:  deep-reinforcement-learning
mmn
Moore Machine Networks (MMN): Learning Finite-State Representations of Recurrent Policy Networks
Stars: ✭ 39 (+77.27%)
Mutual labels:  deep-reinforcement-learning
pomdp-baselines
Simple (but often Strong) Baselines for POMDPs in PyTorch - ICML 2022
Stars: ✭ 162 (+636.36%)
Mutual labels:  deep-reinforcement-learning
rlss-2019
Materials for the Practical Sessions of the Reinforcement Learning Summer School 2019: Bandits, RL & Deep RL (PyTorch).
Stars: ✭ 79 (+259.09%)
Mutual labels:  google-colab

This repo is not maintained anymore, please see HERE

TF-RL(Reinforcement Learning with Tensorflow: EAGER!!)

This is the repo for implementing and experimenting the variety of RL algorithms using Tensorflow Eager Execution. And, since our Lord Google gracefully allows us to use their precious GPU resources without almost restriction, I have decided to enable most of codes run on Google Colab. So, if you don't have GPUs, please feel free to try it out on Google Colab

Note: As it is known that Eager mode takes time than Graph Execution in general so that in this repo, I use Eager for debugging and Graph mode for training!!! The beauty of eager mode come here!! we can flexibly switch eager mode and graph mode with minimal modification(@tf.contrib.eager.defun), pls check the link

Installation

  • Install from Pypi(Test)
# this one
$ pip install --index-url https://test.pypi.org/simple/ --no-deps TF_RL
# or this one
$ pip install -i https://test.pypi.org/simple/ TF-RL
  • Install from Github source
git clone https://github.com/Rowing0914/TF_RL.git
cd TF_RL
python setup.py install

Features

  1. Real time visualisation of an agent after training

  1. Comparison: Performance of algorithms *Duelling DQN is not working well on CartPole..

$ cd examples
$ python3.6 comparisons.py
$ cd ../
# run Tensorboard
$ sh run_tensorboad.sh
  1. Unit Test of a specific algorithm
$ cd examples
# Graph Execution mode
$ python3.6 examples/{model_name}/{model_name}_eager.py
$ python3.6 examples/{model_name}/{model_name}_eager_cartpole.py

# Eager Execution mode
$ python3.6 examples/{model_name}/{model_name}_eager.py --debug_flg=True
$ python3.6 examples/{model_name}/{model_name}_eager_cartpole.py --debug_flg=True
  1. Ready-to-run on Google colab( Result of DQN )
# you can run on google colab, but make sure that there some restriction on session
# 1. 90 minutes session reflesh
# 2. 12 Hours session reflesh
# Assuming you execute cmds below on Google Colab Jupyter Notebook
$ !git clone https://github.com/Rowing0914/TF_RL.git
$ pip install --index-url https://test.pypi.org/simple/ --no-deps TF_RL
$ %cd TF_RL
$ python3.6 examples/{model_name}/{model_name}_eager_atari.py --mode Atari --env_name={env_name} --google_colab=True

# === Execute On Your Local ===
# My dirty workaroud to avoid breaking the connection to Colab is to execute below on local PC
$ watch -n 3600 python3.6 {your_filename}.py

""" Save this code to {your_filename}.py
import pyautogui
import time

# terminal -> chrome or whatever
pyautogui.hotkey("alt", "tab")
time.sleep(0.5)
# reflesh a page
pyautogui.hotkey("ctrl", "r")
time.sleep(1)
# say "YES" to a confirmation dialogue
pyautogui.hotkey("Enter")
time.sleep(1)
# next page
pyautogui.hotkey("ctrl", "tab")
# check all page reload properly
pyautogui.hotkey("ctrl", "tab")
time.sleep(1)
# switch back to terminal
pyautogui.hotkey("alt", "tab")
time.sleep(0.5)
"""

Implementations

  1. Playing Atari with Deep Reinforcement Learning, Mnih et al., 2013 [arxiv] [code]
  2. Deep Reinforcement Learning with Double Q-learning, van Hasselt et al., 2015 [arxiv] [code]
  3. Duelling Network Architectures for Deep Reinforcement Learning, Wang et al., 2016 [arxiv] [code]
  4. Prioritised Experience Replay, T.Shaul et al., 2015 [arxiv] [code]
  5. Asynchronous Methods for Deep Reinforcement Learning, Mnih et al., 2016 [arxiv] [code]
  6. Deep Q-learning from Demonstrations, T.Hester et al., 2017 [arxiv] [code]
  7. Actor-Critic Algorithms, VR Konda and JN Tsitsiklis., 2000 NIPS [arxiv] [code]
  8. Policy Gradient Methods for Reinforcement Learning with Function Approximation, RS Sutton et al., 2000 NIPS [arxiv] [code]
  9. Continuous Control with Deep Reinforcement Leaning, TP Lillicrap et al., 2015 [arxiv] [code]
  10. Deep Recurrent Q-Learning for Partially Observable MDPs, M Hausknecht., 2015 [arxiv] [code]
  11. Hindsight Experience Replay, M.Andrychowicz et al., 2017 [arxiv] [code] [video1] [video2]
  12. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al., 2018 [arxiv] [video] [code]
  13. Trust Region Policy Optimization, J.Schulman et al., 2015 [arxiv] [code]

Results

Future dev

  1. Noisy Networks for Exploration, M.Fortunato et al., 2017 [arxiv]
  2. Distributed DQN
  3. C51
  4. Rainbow etc...

Textbook implementations: R.Sutton's Great Book!

https://github.com/Rowing0914/TF_RL/tree/master/examples/Sutton_RL_Intro

Game Envs

Atari Envs

from tf_rl.common.wrappers import wrap_deepmind, make_atari
from tf_rl.common.params import ENV_LIST_NATURE, ENV_LIST_NIPS


# for env_name in ENV_LIST_NIPS:
for env_name in ENV_LIST_NATURE:
    env = wrap_deepmind(make_atari(env_name))
    state = env.reset()
    for t in range(10):
        # env.render()
        action = env.action_space.sample()
        next_state, reward, done, info = env.step(action)
        # print(reward, next_state)
        state = next_state
        if done:
            break
    print("{}: Episode finished after {} timesteps".format(env_name, t + 1))
    env.close()

Atari Env with Revertable Wrapper

[Youtube Demo]

import time, gym
from tf_rl.common.wrappers import wrap_deepmind, make_atari, ReplayResetEnv

env = wrap_deepmind(make_atari("PongNoFrameskip-v4"))
env = gym.wrappers.Monitor(env, "./video")
env = ReplayResetEnv(env)

state = env.reset()

for t in range(1, 1000):
    env.render()
    action = env.action_space.sample()
    next_state, reward, done, info = env.step(action)
    state = next_state

    if t == 300:
        time.sleep(0.5)
        recover_state = env.get_checkpoint_state()

    if (t > 300) and (t % 100 == 0):
        env.recover(recover_state)
        env.step(0)  # 1 extra step to burn the current state on ALE's RAM is required!!
        env.render()
        time.sleep(0.5)

env.close()

CartPole-Pixel(Obs: Raw Pixels in NumpyArray)

import gym
from tf_rl.common.wrappers import CartPole_Pixel

env = CartPole_Pixel(gym.make('CartPole-v0'))
for ep in range(2):
	env.reset()
	for t in range(100):
		o, r, done, _ = env.step(env.action_space.sample())
		print(o.shape)
		if done:
			break
env.close()

MuJoCo(pls, check the MuJoCo official repo for more details...)

# run this from the terminal and make sure you are loading appropriate environment variables
# $ echo $LD_LIBRARY_PATH

import gym
from tf_rl.common.params import DDPG_ENV_LIST

for env_name, goal_score in DDPG_ENV_LIST.items():
	env = gym.make(env_name)
	env.reset()
	for _ in range(100):
		env.render()
		env.step(env.action_space.sample()) # take a random action

MuJoCo Humanoid Maze

https://github.com/Rowing0914/MuJoCo_Humanoid_Maze

import gym
import humanoid_maze # this is the external library(check the link above!!)

env = gym.make('HumanoidMaze-v0')

env.reset()
for _ in range(2000):
    env.render()
    env.step(env.action_space.sample()) # take a random action
env.close()

I have contributed to this project as well. https://github.com/Breakend/gym-extensions

import gym
from gym_extensions.continuous import mujoco

# available env list: https://github.com/Rowing0914/gym-extensions/blob/mujoco200/tests/all_tests.py
env = gym.make("PusherMovingGoal-v1")

env.reset()
for _ in range(100):
    env.render()
    s, r, d, i = env.step(env.action_space.sample()) # take a random action
    print(s.shape, r, d, i)
env.close()

MuJoCo annotation flag: https://github.com/Rowing0914/gym_modified/blob/11696be8bba436db36c1caa0457040d06ca05b50/gym/envs/mujoco/mujoco_env.py#L129

PC Envs

  • OS: Linux Ubuntu LTS 16.04
  • Python: 2.7/3.6 (For MuJoCo Env, 2.7 might not work)
  • GPU: NVIDIA RTX 2080 Max Q Design
  • Tensorflow: 1.14.0
  • CUDA: 10.0
  • libcudnn: 7.6.2

GPU Env Maintenance on Ubuntu 16.04 (CUDA 10)

Check this link as well: https://www.tensorflow.org/install/gpu

# Add NVIDIA package repositories
# Add HTTPS support for apt-key
sudo apt-get install gnupg-curl
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA Driver
# Issue with driver install requires creating /usr/lib/nvidia
sudo mkdir /usr/lib/nvidia
sudo apt-get install --no-install-recommends nvidia-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-0 \
    libcudnn7=7.6.2.24-1+cuda10.0  \
    libcudnn7-dev=7.6.2.24-1+cuda10.0


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 \
    libnvinfer-dev=5.1.5-1+cuda10.0

Disclaimer: you will see similar codes in different source codes.

In this repo, I would like to ignore the efficiency in development, because although I have seen a lot of clean and neat implementations of DRL algorithms on the net, I think sometimes they excessively modularise some components by introducing a lot of extra parameters or flags which are not in the original papers, in other words, they are toooo professional for me to play with.

So, in this repo I do not hesitate to re-use the same codes here or there. BUT I believe this way of organising the algorithms enhances experimentability a lot compared to try making the algorithms compact and professional.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].