All Projects → danijar → Mindpark

danijar / Mindpark

Licence: gpl-3.0
Testbed for deep reinforcement learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Mindpark

Lagom
lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
Stars: ✭ 364 (+123.31%)
Mutual labels:  artificial-intelligence, research, reinforcement-learning
Dreamerv2
Mastering Atari with Discrete World Models
Stars: ✭ 287 (+76.07%)
Mutual labels:  artificial-intelligence, research, reinforcement-learning
Free Ai Resources
🚀 FREE AI Resources - 🎓 Courses, 👷 Jobs, 📝 Blogs, 🔬 AI Research, and many more - for everyone!
Stars: ✭ 192 (+17.79%)
Mutual labels:  artificial-intelligence, research, reinforcement-learning
Pygame Learning Environment
PyGame Learning Environment (PLE) -- Reinforcement Learning Environment in Python.
Stars: ✭ 828 (+407.98%)
Mutual labels:  artificial-intelligence, research, reinforcement-learning
Rlai Exercises
Exercise Solutions for Reinforcement Learning: An Introduction [2nd Edition]
Stars: ✭ 97 (-40.49%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Papers Literature Ml Dl Rl Ai
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
Stars: ✭ 1,341 (+722.7%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Research And Coding
研究资源列表 A curated list of research resources
Stars: ✭ 100 (-38.65%)
Mutual labels:  artificial-intelligence, research
Toycarirl
Implementation of Inverse Reinforcement Learning Algorithm on a toy car in a 2D world problem, (Apprenticeship Learning via Inverse Reinforcement Learning Abbeel & Ng, 2004)
Stars: ✭ 128 (-21.47%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Snake
Artificial intelligence for the Snake game.
Stars: ✭ 1,241 (+661.35%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Reinforcement Learning Cheat Sheet
Reinforcement Learning Cheat Sheet
Stars: ✭ 104 (-36.2%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Top 10 Computer Vision Papers 2020
A list of the top 10 computer vision papers in 2020 with video demos, articles, code and paper reference.
Stars: ✭ 132 (-19.02%)
Mutual labels:  artificial-intelligence, research
60 days rl challenge
60_Days_RL_Challenge中文版
Stars: ✭ 92 (-43.56%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Mapleai
AI各领域学习资料整理。(A collection of all skills and knowledges should be got command of to obtain an AI relevant job offer. There are online blogs, my personal blogs, electronic books copy.)
Stars: ✭ 89 (-45.4%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Chemgan Challenge
Code for the paper: Benhenda, M. 2017. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv preprint arXiv:1708.08227.
Stars: ✭ 98 (-39.88%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Simulator
A ROS/ROS2 Multi-robot Simulator for Autonomous Vehicles
Stars: ✭ 1,260 (+673.01%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Reinforcement Learning An Introduction
Python Implementation of Reinforcement Learning: An Introduction
Stars: ✭ 11,042 (+6674.23%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Awesome Ai
A curated list of artificial intelligence resources (Courses, Tools, App, Open Source Project)
Stars: ✭ 161 (-1.23%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Flappy Es
Flappy Bird AI using Evolution Strategies
Stars: ✭ 140 (-14.11%)
Mutual labels:  artificial-intelligence, reinforcement-learning
Deep Cfr
Scalable Implementation of Deep CFR and Single Deep CFR
Stars: ✭ 158 (-3.07%)
Mutual labels:  research, reinforcement-learning
Java Deep Learning Cookbook
Code for Java Deep Learning Cookbook
Stars: ✭ 156 (-4.29%)
Mutual labels:  artificial-intelligence, reinforcement-learning

Mindpark

Testbed for deep reinforcement learning algorithms.

DQN playing Breakout   DQN playing Doom Health Gathering   DQN trying to play Doom Deathmatch

Introduction

Reinforcement learning is a fundamental problem in artificial intelligence. In this setting, an agent interacts with an environment in order to maximize a reward. For example, we show our bot pixel screens of a game and want it to choose actions that result in a high score.

Mindpark is an environment for prototyping, testing, and comparing algorithms that do reinforcement learning. The library makes it easy to reuse part of behavior between algorithms, and monitor all kinds of metrics about your algorithms. It integrates well with TensorFlow, Theano, and other deep learning libraries, and with OpenAI's gym environments.

These are the algorithms that I implemented so far (feel free to contribute to this list):

Algorithm Publication Status
Deep Q-Network (DQN) Mnih et al. 2015 (PDF) Working consistently.
Double Deep Q-Network (DDQN) Hasselt, Guez, Silver. 2015 (PDF) Working consistently.
Asynchronous Advantage Actor-Critic (A3C) Mnih et al. 2016 (PDF) Partly working.
Reinforce Williams 1992 (PDF) Currently being tested.

Instructions

To get started, clone the repository and install dependencies:

git clone [email protected]:danijar/mindpark.git && cd mindpark
sudo -H pip3 install .

An experiment compares between algorithms, hyper parameters, and environments. To start an experiment, run (-O turns on Python's optimizations):

python3 -O -m mindpark run definition/breakout.yaml

Videos and metrics are stored in a result directory, which is ~/experiment/mindpark/<timestamp>-breakout/ by default. You can plot statistics during or after the simulation by fuzzy matching an the folder name:

python3 -m mindpark stats breakout

Statistics

Let's take a look at what the previous command creates.

Experiments consist of interleaved phases of training and evaluation. For example, an algorithm might use a lower exploration rate in favor of exploitation while being evaluated. Therefore, we display the metrics in two rows:

DQN statistics on Breakout

This illustrates the metrics after a few episodes of training, as you can see on the horizontal axes. This small example is good for explanation. But if you want to take a look, here are the metrics of a longer experiment.

Metric Description
score During the first 80 episodes of training (the time when I ran mindpark stats), the algorithm manages to get a score of 9, but usually get scores around 3 and 4. Below is the score during evaluation. It's lower because the algorithm hasn't learned much yet and performs worse than the random exploration done during training.
dqn/cost The training cost of the neural network. It starts at episode 10 which is when the training starts, before that, DQN builds up its replay memory. We don't train the neural network during evaluation, so that plot is empty.
epsilion_greedy/values That's the Q-values that the dqn behavior sends to epsilon_greedy to act greedily on. You can see that they evolve over time: Action 4 seems to be quite good. But that's only for a short run, so we shouldn't conclude too much.
epsilion_greedy/random A histogram whether the current action was chosen randomly or greedy wrt the predicted Q-values. During training, epsilon is annealed, so you see a shift in the distribution. During testing, epsilon is always 0.05, so not many random actions there.

The metric names are prefixed by the classes they come from. That's because algorithms are composed of reusable partial behaviors. See the Algorithms section for details.

Definitions

Definitions are YAML files that contain all you need to run or reproduce an experiment:

epochs: 100
test_steps: 4e4
repeats: 5
envs:
  - Pong-v0
  - Breakout-v0
algorithms:
  -
    name: LSTM-A3C (3 layers)
    type: A3C
    train_steps: 8e5
    config:
      network: lstm_three_layers
      initial_learning_rate: 2e-4
  -
    name: DQN (Mnih et al. 2015)
    type: DQN
    train_steps: 2e5
  -
    name: Random
    type: Random
    train_steps: 0

Each algorithm will be trained on each environment for the specified number of repeats. A simulation is divided into epochs that consist of a training and an evaluation phase.

Algorithms

To implement your own algorithm, subclass mindpark.core.Algorithm. Please refer to the existing algorithms for details, and ask if you have questions. Algorithms are composed of partial behaviors that can do preprocessing, exploration, learning, and more. To create a reusable chunk of behavior, subclass mindpark.core.Partial.

There are quite a few existing behaviors that you can import from mindpark.step and reuse in your algorithms. For more details, please look at the according Python files or open an issue. Current behaviors include: ActionMax, ActionSample, ClampReward, Delta, EpsilonGreedy, Experience, Filter, Grayscale, History, Identity, Maximum, Normalize, Random, RandomStart, Resize, Score, Skip, Subsample.

Dependencies

Mindpark is a Python 3 package, and there are no plans to support Python 2. Please install gym_doom manually.

sudo apt-get install -y gtk2.0-dev libsdl2-dev libfluidsynth-dev libopenal-dev libboost-all-dev
sudo -H python3 -c "import gym_pull; gym_pull.pull('github.com/ppaquette/gym-doom')"

TensorFlow is only needed for the existing algorithms. You are free to use your libraries of choice to implement your own algorithms.

Contributions

Your pull request is very welcome. I will set up a contributors file in that case, and you can choose if and how you want to be listed.

Please follow the existing code style, and run unit tests and the integration test after changes:

python3 setup.py test
python3 -m mindpark run definition/test.yaml -x

Contact

Feel free to reach out at [email protected] or open an issue here on Github if you have any questions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].