All Projects → mtrazzi → Meta_rl

mtrazzi / Meta_rl

Licence: mit
The Tensorflow code and a DeepMind Lab wrapper for my article "Meta-Reinforcement Learning" on FloydHub.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Meta rl

Gym Dart
OpenAI Gym environments using DART
Stars: ✭ 20 (-44.44%)
Mutual labels:  reinforcement-learning
Gym Panda
An OpenAI Gym Env for Panda
Stars: ✭ 29 (-19.44%)
Mutual labels:  reinforcement-learning
Vowpal wabbit
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Stars: ✭ 7,815 (+21608.33%)
Mutual labels:  reinforcement-learning
Doyouevenlearn
Essential Guide to keep up with AI/ML/DL/CV
Stars: ✭ 913 (+2436.11%)
Mutual labels:  reinforcement-learning
Batch Ppo
Efficient Batched Reinforcement Learning in TensorFlow
Stars: ✭ 945 (+2525%)
Mutual labels:  reinforcement-learning
Neuromatch Academy
Preparatory Materials, Self-guided Learning, and Project Management for Neuromatch Academy activities
Stars: ✭ 30 (-16.67%)
Mutual labels:  neuroscience
Acis
Actor-Critic Instance Segmentation (CVPR 2019)
Stars: ✭ 15 (-58.33%)
Mutual labels:  reinforcement-learning
Stock Price Trade Analyzer
This is a Python 3.0 project for analyzing stock prices and methods of stock trading. It uses native Python tools and Google TensorFlow machine learning.
Stars: ✭ 35 (-2.78%)
Mutual labels:  reinforcement-learning
Impala Distributed Tensorflow
Stars: ✭ 28 (-22.22%)
Mutual labels:  reinforcement-learning
Conversational Ai
Conversational AI Reading Materials
Stars: ✭ 34 (-5.56%)
Mutual labels:  reinforcement-learning
Awesome Ai In Finance
🔬 A curated list of awesome machine learning strategies & tools in financial market.
Stars: ✭ 910 (+2427.78%)
Mutual labels:  reinforcement-learning
Gym
Seoul AI Gym is a toolkit for developing AI algorithms.
Stars: ✭ 27 (-25%)
Mutual labels:  reinforcement-learning
Pokerrl Omaha
Omaha Poker functionality+some features for PokerRL Reinforcement Learning card framwork
Stars: ✭ 31 (-13.89%)
Mutual labels:  reinforcement-learning
Onix
ONI-compatible hardware, firmware, and host APIs for advanced neuroscience experiments.
Stars: ✭ 20 (-44.44%)
Mutual labels:  neuroscience
Left Shift
Using deep reinforcement learning to tackle the game 2048.
Stars: ✭ 35 (-2.78%)
Mutual labels:  reinforcement-learning
Udacity Deep Learning Nanodegree
This is just a collection of projects that made during my DEEPLEARNING NANODEGREE by UDACITY
Stars: ✭ 15 (-58.33%)
Mutual labels:  reinforcement-learning
Drlkit
A High Level Python Deep Reinforcement Learning library. Great for beginners, prototyping and quickly comparing algorithms
Stars: ✭ 29 (-19.44%)
Mutual labels:  reinforcement-learning
Rlcard
Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
Stars: ✭ 980 (+2622.22%)
Mutual labels:  reinforcement-learning
Artificialintelligenceengines
Computer code collated for use with Artificial Intelligence Engines book by JV Stone
Stars: ✭ 35 (-2.78%)
Mutual labels:  reinforcement-learning
Emdp
Easy MDPs and grid worlds with accessible transition dynamics to do exact calculations
Stars: ✭ 31 (-13.89%)
Mutual labels:  reinforcement-learning

Harlow Task

DISCLAIMER⚠ This is the git submodule for the Harlow task for my article Meta-Reinforcement Learning on FloydHub.

  • For the main repository for the Harlow task (with more information about the task) see here.
  • For the two-step task see here.

To get started, check out the parent README.md.

Discussion

I answer questions and give more informations here:

Directory structure

meta-rl
├── harlow.py                 # main file that implements the DeepMind Lab wrapper, processes the frames and run the trainings, initializing a DeepMind Lab environment.
└── meta_rl
    ├── worker.py             # implements the class `Worker`, that contains the method `work` to collect training data and `train` to train the networks on this training data.
    └── ac_network.py         # implements the class `AC_Network`, where we initialize all the networks & the loss function.

Branches

  • master: for this branch, the frames are pre-processed, on a dataset of 42 pictures of students from 42 (cf. the FloydHub blog for more details). Our model achieved 40% performance on this simplified version of the Harlow task.

img/reward_cuve_5_seeds_42_images.png

img/conv_plus_stacked_lstm.png

  • monothread2pixel: here, we used for our dataset only a black image and a white image. We pre-processed those two images so our agent only sees a one-hot, that is either [0,1] or [1,0]. Here is the resulting reward curve after training:

img/monothread2pixels.png

  • multiprocessing: I implemented multiprocessing using Python’s library multiprocessing. However, it appeared that Tensorflow doesn’t allow to use multiprocessing after having imported tensorflow, so that multiprocessing branch came to a dead end.

  • ray: we also tried multiprocessing with ray another multiprocessing library. However, it didn’t work out because DeepMind was not pickable, i.e. it couldn’t be serialized using pickle.

Todo

On branch master:

  • [ ] train with more episodes (for instance 20-50k) to see if some seeds keep learning.
  • [ ] train with different seeds, to see if some seeds can reach > 40% performance.
  • [ ] train with more units in the LSTM (for instance > 100 instead of 48), to see if it can keep learning after 10k episodes.
  • [ ] train with more images (for instance 1000).

For multi-threading (e.g. in dev):

  • [ ] support for distributed tensorflow on multiple GPUs.
  • [ ] get rid of CPython's global interpreter lock by connecting Tensorflow's C API with DeepMind Lab C API.

For multiprocessing:

  • [ ] in multiprocessing branch, try to import tensorflow after the multiprocessing calls.
  • [ ] in ray, try to make the DeepMind Lab environment pickable (for instance by looking at how OpenAI made their physics engine mujoco-py pickable.

Support

  • We support Python3.6.

  • The branch master was tested on FloydHub's instances (using Tensorflow 1.12 and CPU). To change for GPU, change tf.device("/cpu:0") with tf.device("/device:GPU:0") in harlow.py.

Pip

All the pip packages should be either installed on FloydHub or installed with install.sh.

However, if you want to run this repository on your machine, here are the requirements:

numpy==1.16.2
tensorflow==1.12.0
six==1.12.0
scipy==1.2.1
skimage==0.0
setuptools==40.8.0
Pillow==5.4.1

Additionally, for the branch ray you might need to do (pip install ray) and for the branch multiprocessing you would need to install multiprocessing with (pip install multiprocessing).

Credits

This work uses awjuliani's Meta-RL implementation.

I couldn't have done without my dear friend Kevin Costa, and the additional details provided kindly by Jane Wang.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].