All Projects → 0bserver07 → Study Reinforcement Learning

0bserver07 / Study Reinforcement Learning

Licence: other
Studying Reinforcement Learning Guide

Projects that are alternatives of or similar to Study Reinforcement Learning

Ai plays snake
AI trained using Genetic Algorithm and Deep Learning to play the game of snake
Stars: ✭ 137 (-6.8%)
Mutual labels:  reinforcement-learning
Gb
A minimal C implementation of Nintendo Gameboy - An fast research environment for Reinforcement Learning
Stars: ✭ 143 (-2.72%)
Mutual labels:  reinforcement-learning
Tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Stars: ✭ 11,865 (+7971.43%)
Mutual labels:  reinforcement-learning
Safe learning
Safe reinforcement learning with stability guarantees
Stars: ✭ 140 (-4.76%)
Mutual labels:  reinforcement-learning
Studynotes.org
✏️ Learn faster. Study better.
Stars: ✭ 142 (-3.4%)
Mutual labels:  study
Allenact
An open source framework for research in Embodied-AI from AI2.
Stars: ✭ 144 (-2.04%)
Mutual labels:  reinforcement-learning
Policy Gradient
Minimal Monte Carlo Policy Gradient (REINFORCE) Algorithm Implementation in Keras
Stars: ✭ 135 (-8.16%)
Mutual labels:  reinforcement-learning
Chess Alpha Zero
Chess reinforcement learning by AlphaGo Zero methods.
Stars: ✭ 1,868 (+1170.75%)
Mutual labels:  reinforcement-learning
Cherry
A PyTorch Library for Reinforcement Learning Research
Stars: ✭ 143 (-2.72%)
Mutual labels:  reinforcement-learning
Sumo Rl
A simple interface to instantiate Reinforcement Learning environments with SUMO for Traffic Signal Control. Compatible with Gym Env from OpenAI and MultiAgentEnv from RLlib.
Stars: ✭ 145 (-1.36%)
Mutual labels:  reinforcement-learning
Flappy Es
Flappy Bird AI using Evolution Strategies
Stars: ✭ 140 (-4.76%)
Mutual labels:  reinforcement-learning
Big Data Study
🐳 big data study
Stars: ✭ 141 (-4.08%)
Mutual labels:  study
Machin
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
Stars: ✭ 145 (-1.36%)
Mutual labels:  reinforcement-learning
Complete Life Cycle Of A Data Science Project
Complete-Life-Cycle-of-a-Data-Science-Project
Stars: ✭ 140 (-4.76%)
Mutual labels:  reinforcement-learning
Rl Book Challenge
self-studying the Sutton & Barto the hard way
Stars: ✭ 146 (-0.68%)
Mutual labels:  reinforcement-learning
Savn
Learning to Learn how to Learn: Self-Adaptive Visual Navigation using Meta-Learning (https://arxiv.org/abs/1812.00971)
Stars: ✭ 135 (-8.16%)
Mutual labels:  reinforcement-learning
Data Science Question Answer
A repo for data science related questions and answers
Stars: ✭ 2,000 (+1260.54%)
Mutual labels:  reinforcement-learning
Rainbow
A PyTorch implementation of Rainbow DQN agent
Stars: ✭ 147 (+0%)
Mutual labels:  reinforcement-learning
Show Adapt And Tell
Code for "Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner" in ICCV 2017
Stars: ✭ 146 (-0.68%)
Mutual labels:  reinforcement-learning
Articulations Robot Demo
Stars: ✭ 145 (-1.36%)
Mutual labels:  reinforcement-learning

Study Reinforcement Learning & (Deep RL) Guide:

  • Simple guide and collective to study RL/DeepRL in one to 2.5 months of time.

Talks to check out first:


  • Introduction to Reinforcement Learning by Joelle Pineau, McGill University:

    • Applications of RL.

    • When to use RL?

    • RL vs supervised learning

    • What is MDP? Markov Decision Process

    • Components of an RL agent:

      • states
      • actions (Probabilistic effects)
      • Reward function
      • Initial state distribution
                                      +-----------------+
               +--------------------- |                 |
               |                      |      Agent      |
               |                      |                 | +---------------------+
               |         +----------> |                 |                       |
               |         |            +-----------------+                       |
               |         |                                                      |
         state |         | reward                                               | action
         S(t)  |         | r(t)                                                 | a(t)
               |         |                                                      |
               |         | +                                                    |
               |         | |  r(t+1) +----------------------------+             |
               |         +-----------+                            |             |
               |           |         |                            | <-----------+
               |           |         |      Environment           |
               |           |  S(t+1) |                            |
               +---------------------+                            |
                           |         +----------------------------+
                           +
        
         * Sutton and Barto (1998)
        
        
    • Explanation of the Markov Property:

    • Why Maximizing utility in:

      • Episodic tasks
      • Continuing tasks
        • The discount factor, gamma γ
    • What is the policy & what to do with it?

      • A policy defines the action-selection strategy at every state:
    • Value functions:

      • The value of a policy equations are (two forms of) Bellman’s equation.
      • (This is a dynamic programming algorithm).
      • Iterative Policy Evaluation:
        • Main idea: turn Bellman equations into update rules.
    • Optimal policies and optimal value functions.

      • Finding a good policy: Policy Iteration (Check the talk Below By Peter Abeel)
      • Finding a good policy: Value iteration
        • Asynchronous value iteration:
        • Instead of updating all states on every iteration, focus on important states.
    • Key challenges in RL:

      • Designing the problem domain
        • State representation – Action choice – Cost/reward signal
      • Acquiring data for training – Exploration / exploitation – High cost actions – Time-delayed cost/reward signal
      • Function approximation
      • Validation / confidence measures
    • The RL lingo.

    • In large state spaces: Need approximation:

      • Fitted Q-iteration:
        • Use supervised learning to estimate the Q-function from a batch of training data:
        • Input, Output and Loss.
          • i.e: The Arcade Learning Environment
    • Deep Q-network (DQN) and tips.

  • Deep Reinforcement Learning by Pieter Abbeel, EE & CS, UC Berkeley

Books:


Courses:


  • Reinforcement Learning by David Silver.

    • Lecture 1: Introduction to Reinforcement Learning
    • Lecture 2: Markov Decision Processes
    • Lecture 3: Planning by Dynamic Programming
    • Lecture 4: Model-Free Prediction
    • Lecture 5: Model-Free Control
    • Lecture 6: Value Function Approximation
    • Lecture 7: Policy Gradient Methods
    • Lecture 8: Integrating Learning and Planning
    • Lecture 9: Exploration and Exploitation
    • Lecture 10: Case Study: RL in Classic Games
  • CS 294: Deep Reinforcement Learning, Spring 2017 by John Schulman and Pieter Abbeel.

    • Instructors: Sergey Levine, John Schulman, Chelsea Finn:
    • My Bad Notes

cc

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].