Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → 0bserver07 → Study Reinforcement Learning

0bserver07 / Study Reinforcement Learning

Licence: other

Studying Reinforcement Learning Guide

Labels

deep-learning machine-learning reinforcement-learning study

Projects that are alternatives of or similar to Study Reinforcement Learning

Ai plays snake

AI trained using Genetic Algorithm and Deep Learning to play the game of snake

Stars: ✭ 137 (-6.8%)

Mutual labels: reinforcement-learning

A minimal C implementation of Nintendo Gameboy - An fast research environment for Reinforcement Learning

Stars: ✭ 143 (-2.72%)

Mutual labels: reinforcement-learning

Tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Stars: ✭ 11,865 (+7971.43%)

Mutual labels: reinforcement-learning

Safe learning

Safe reinforcement learning with stability guarantees

Stars: ✭ 140 (-4.76%)

Mutual labels: reinforcement-learning

Studynotes.org

✏️ Learn faster. Study better.

Stars: ✭ 142 (-3.4%)

Mutual labels: study

Allenact

An open source framework for research in Embodied-AI from AI2.

Stars: ✭ 144 (-2.04%)

Mutual labels: reinforcement-learning

Policy Gradient

Minimal Monte Carlo Policy Gradient (REINFORCE) Algorithm Implementation in Keras

Stars: ✭ 135 (-8.16%)

Mutual labels: reinforcement-learning

Chess Alpha Zero

Chess reinforcement learning by AlphaGo Zero methods.

Stars: ✭ 1,868 (+1170.75%)

Mutual labels: reinforcement-learning

Cherry

A PyTorch Library for Reinforcement Learning Research

Stars: ✭ 143 (-2.72%)

Mutual labels: reinforcement-learning

Sumo Rl

A simple interface to instantiate Reinforcement Learning environments with SUMO for Traffic Signal Control. Compatible with Gym Env from OpenAI and MultiAgentEnv from RLlib.

Stars: ✭ 145 (-1.36%)

Mutual labels: reinforcement-learning

Flappy Es

Flappy Bird AI using Evolution Strategies

Stars: ✭ 140 (-4.76%)

Mutual labels: reinforcement-learning

Big Data Study

🐳 big data study

Stars: ✭ 141 (-4.08%)

Mutual labels: study

Machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Stars: ✭ 145 (-1.36%)

Mutual labels: reinforcement-learning

Complete Life Cycle Of A Data Science Project

Complete-Life-Cycle-of-a-Data-Science-Project

Stars: ✭ 140 (-4.76%)

Mutual labels: reinforcement-learning

Rl Book Challenge

self-studying the Sutton & Barto the hard way

Stars: ✭ 146 (-0.68%)

Mutual labels: reinforcement-learning

Savn

Learning to Learn how to Learn: Self-Adaptive Visual Navigation using Meta-Learning (https://arxiv.org/abs/1812.00971)

Stars: ✭ 135 (-8.16%)

Mutual labels: reinforcement-learning

Data Science Question Answer

A repo for data science related questions and answers

Stars: ✭ 2,000 (+1260.54%)

Mutual labels: reinforcement-learning

Rainbow

A PyTorch implementation of Rainbow DQN agent

Stars: ✭ 147 (+0%)

Mutual labels: reinforcement-learning

Show Adapt And Tell

Code for "Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner" in ICCV 2017

Stars: ✭ 146 (-0.68%)

Mutual labels: reinforcement-learning

Articulations Robot Demo

Stars: ✭ 145 (-1.36%)

Mutual labels: reinforcement-learning

View All Similar Projects ➔

Study Reinforcement Learning & (Deep RL) Guide:

Simple guide and collective to study RL/DeepRL in one to 2.5 months of time.

Talks to check out first:

Introduction to Reinforcement Learning by Joelle Pineau, McGill University:
- Applications of RL.
- When to use RL?
- RL vs supervised learning
- What is MDP? Markov Decision Process
- Components of an RL agent:
  - states
  - actions (Probabilistic effects)
  - Reward function
  - Initial state distribution
```
                              +-----------------+
       +--------------------- |                 |
       |                      |      Agent      |
       |                      |                 | +---------------------+
       |         +----------> |                 |                       |
       |         |            +-----------------+                       |
       |         |                                                      |
 state |         | reward                                               | action
 S(t)  |         | r(t)                                                 | a(t)
       |         |                                                      |
       |         | +                                                    |
       |         | |  r(t+1) +----------------------------+             |
       |         +-----------+                            |             |
       |           |         |                            | <-----------+
       |           |         |      Environment           |
       |           |  S(t+1) |                            |
       +---------------------+                            |
                   |         +----------------------------+
                   +

 * Sutton and Barto (1998)
```
- Explanation of the Markov Property:
- Why Maximizing utility in:
  - Episodic tasks
  - Continuing tasks
    - The discount factor, gamma γ
- What is the policy & what to do with it?
  - A policy defines the action-selection strategy at every state:
- Value functions:
  - The value of a policy equations are (two forms of) Bellman’s equation.
  - (This is a dynamic programming algorithm).
  - Iterative Policy Evaluation:
    - Main idea: turn Bellman equations into update rules.
- Optimal policies and optimal value functions.
  - Finding a good policy: Policy Iteration (Check the talk Below By Peter Abeel)
  - Finding a good policy: Value iteration
    - Asynchronous value iteration:
    - Instead of updating all states on every iteration, focus on important states.
- Key challenges in RL:
  - Designing the problem domain
    - State representation – Action choice – Cost/reward signal
  - Acquiring data for training – Exploration / exploitation – High cost actions – Time-delayed cost/reward signal
  - Function approximation
  - Validation / confidence measures
- The RL lingo.
- In large state spaces: Need approximation:
  - Fitted Q-iteration:
    - Use supervised learning to estimate the Q-function from a batch of training data:
    - Input, Output and Loss.
      - i.e: The Arcade Learning Environment
- Deep Q-network (DQN) and tips.
Deep Reinforcement Learning by Pieter Abbeel, EE & CS, UC Berkeley
- Why Policy Optimization?
- Cross Entropy Method (CEM) / Finite Differences / Fixing Random Seed
- Likelihood Ratio (LR) Policy Gradient
- Natural Gradient / Trust Regions (-> TRPO)
- Actor-Critic (-> GAE, A3C)
- Path Derivatives (PD) (-> DPG, DDPG, SVG)
- Stochastic Computation Graphs (generalizes LR / PD)
- Guided Policy Search (GPS)
- Inverse Reinforcement Learning
  - Inverse RL vs. behavioral cloning
- Explanation with Implementation for some of the topics mentioned in the Deep Reinforcement Learning talk, written by Arthur Juliani

Books:

Before starting out the books, here is a neat overview by Yuxi Li about Deep RL:
- Deep Reinforcement Learning: An Overview
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Algorithms for Reinforcement Learning.
Reinforcement Learning and Dynamic Programming using Function Approximators.

Courses:

Reinforcement Learning by David Silver.
- Lecture 1: Introduction to Reinforcement Learning
- Lecture 2: Markov Decision Processes
- Lecture 3: Planning by Dynamic Programming
- Lecture 4: Model-Free Prediction
- Lecture 5: Model-Free Control
- Lecture 6: Value Function Approximation
- Lecture 7: Policy Gradient Methods
- Lecture 8: Integrating Learning and Planning
- Lecture 9: Exploration and Exploitation
- Lecture 10: Case Study: RL in Classic Games
CS 294: Deep Reinforcement Learning, Spring 2017 by John Schulman and Pieter Abbeel.
- Instructors: Sergey Levine, John Schulman, Chelsea Finn:
- My Bad Notes

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 147

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗