All Projects → JuliaML → Reinforce.jl

JuliaML / Reinforce.jl

Licence: other
Abstractions, algorithms, and utilities for reinforcement learning in Julia

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to Reinforce.jl

Mindpark
Testbed for deep reinforcement learning
Stars: ✭ 163 (-8.43%)
Mutual labels:  reinforcement-learning
A2c
A Clearer and Simpler Synchronous Advantage Actor Critic (A2C) Implementation in TensorFlow
Stars: ✭ 169 (-5.06%)
Mutual labels:  reinforcement-learning
Atari
AI research environment for the Atari 2600 games 🤖.
Stars: ✭ 174 (-2.25%)
Mutual labels:  reinforcement-learning
Coach
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Stars: ✭ 2,085 (+1071.35%)
Mutual labels:  reinforcement-learning
Acme
A library of reinforcement learning components and agents
Stars: ✭ 2,441 (+1271.35%)
Mutual labels:  reinforcement-learning
Gym Pybullet Drones
PyBullet Gym environments for single and multi-agent reinforcement learning of quadcopter control
Stars: ✭ 168 (-5.62%)
Mutual labels:  reinforcement-learning
Awesome Ai
A curated list of artificial intelligence resources (Courses, Tools, App, Open Source Project)
Stars: ✭ 161 (-9.55%)
Mutual labels:  reinforcement-learning
Deep Algotrading
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading
Stars: ✭ 173 (-2.81%)
Mutual labels:  reinforcement-learning
2048 Deep Reinforcement Learning
Trained A Convolutional Neural Network To Play 2048 using Deep-Reinforcement Learning
Stars: ✭ 169 (-5.06%)
Mutual labels:  reinforcement-learning
Pytorch sac
PyTorch implementation of Soft Actor-Critic (SAC)
Stars: ✭ 174 (-2.25%)
Mutual labels:  reinforcement-learning
Coax
This project was moved to: https://github.com/coax-dev/coax
Stars: ✭ 166 (-6.74%)
Mutual labels:  reinforcement-learning
Awesome Ml Courses
Awesome free machine learning and AI courses with video lectures.
Stars: ✭ 2,145 (+1105.06%)
Mutual labels:  reinforcement-learning
Elf
An End-To-End, Lightweight and Flexible Platform for Game Research
Stars: ✭ 2,057 (+1055.62%)
Mutual labels:  reinforcement-learning
Rl Baselines3 Zoo
A collection of pre-trained RL agents using Stable Baselines3, training and hyperparameter optimization included.
Stars: ✭ 161 (-9.55%)
Mutual labels:  reinforcement-learning
Adeptrl
Reinforcement learning framework to accelerate research
Stars: ✭ 173 (-2.81%)
Mutual labels:  reinforcement-learning
Mjrl
Reinforcement learning algorithms for MuJoCo tasks
Stars: ✭ 162 (-8.99%)
Mutual labels:  reinforcement-learning
Data Science Toolkit
Collection of stats, modeling, and data science tools in Python and R.
Stars: ✭ 169 (-5.06%)
Mutual labels:  reinforcement-learning
Tensorflow Rl
Implementations of deep RL papers and random experimentation
Stars: ✭ 176 (-1.12%)
Mutual labels:  reinforcement-learning
Machine Learning And Reinforcement Learning In Finance
Machine Learning and Reinforcement Learning in Finance New York University Tandon School of Engineering
Stars: ✭ 173 (-2.81%)
Mutual labels:  reinforcement-learning
Jericho
A learning environment for man-made Interactive Fiction games.
Stars: ✭ 173 (-2.81%)
Mutual labels:  reinforcement-learning

Reinforce

Build Status Gitter

Reinforce.jl is an interface for Reinforcement Learning. It is intended to connect modular environments, policies, and solvers with a simple interface.


Packages which build on Reinforce:

Environment Interface

New environments are created by subtyping AbstractEnvironment and implementing a few methods:

  • reset!(env) -> env
  • actions(env, s) -> A
  • step!(env, s, a) -> (r, s′)
  • finished(env, s′) -> Bool

and optional overrides:

  • state(env) -> s
  • reward(env) -> r

which map to env.state and env.reward respectively when unset.

  • ismdp(env) -> Bool

An environment may be fully observable (MDP) or partially observable (POMDP). In the case of a partially observable environment, the state s is really an observation o. To maintain consistency, we call everything a state, and assume that an environment is free to maintain additional (unobserved) internal state. The ismdp query returns true when the environment is MDP, and false otherwise.

  • maxsteps(env) -> Int

The terminating condition of an episode is control by maxsteps() || finished(). It's default value is 0, indicates unlimited.


An minimal example for testing purpose is test/foo.jl.

TODO: more details and examples

Policy Interface

Agents/policies are created by subtyping AbstractPolicy and implementing action. The built-in random policy is a short example:

struct RandomPolicy <: AbstractPolicy end
action(π::RandomPolicy, r, s, A) = rand(A)

Where A is the action space. The action method maps the last reward and current state to the next chosen action: (r, s) -> a.

  • reset!(π::AbstractPolicy) -> π

Episode Iterator

Iterate through episodes using the Episode iterator. A 4-tuple (s,a,r,s′) is returned from each step of the episode:

ep = Episode(env, π)
for (s, a, r, s′) in ep
    # do some custom processing of the sars-tuple
end
R = ep.total_reward
T = ep.niter

There is also a convenience method run_episode. The following is an equivalent method to the last example:

R = run_episode(env, π) do
    # anything you want... this section is called after each step
end

Author: Tom Breloff

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].