All Projects → mmcenta → Left Shift

mmcenta / Left Shift

Licence: mit
Using deep reinforcement learning to tackle the game 2048.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Left Shift

Rl algos
Reinforcement Learning Algorithms
Stars: ✭ 14 (-60%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Tensorflow Tutorial
TensorFlow and Deep Learning Tutorials
Stars: ✭ 748 (+2037.14%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Trax
Trax — Deep Learning with Clear Code and Speed
Stars: ✭ 6,666 (+18945.71%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Gibsonenv
Gibson Environments: Real-World Perception for Embodied Agents
Stars: ✭ 666 (+1802.86%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Drlkit
A High Level Python Deep Reinforcement Learning library. Great for beginners, prototyping and quickly comparing algorithms
Stars: ✭ 29 (-17.14%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Softlearning
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.
Stars: ✭ 713 (+1937.14%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Pytorch Rl
Deep Reinforcement Learning with pytorch & visdom
Stars: ✭ 745 (+2028.57%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Habitat Lab
A modular high-level library to train embodied AI agents across a variety of tasks, environments, and simulators.
Stars: ✭ 587 (+1577.14%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Btgym
Scalable, event-driven, deep-learning-friendly backtesting library
Stars: ✭ 765 (+2085.71%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Osim Rl
Reinforcement learning environments with musculoskeletal models
Stars: ✭ 763 (+2080%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Pytorch Rl
PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
Stars: ✭ 658 (+1780%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Slm Lab
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Stars: ✭ 904 (+2482.86%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (+1728.57%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Pytorch A3c
PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
Stars: ✭ 879 (+2411.43%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Deepdrive
Deepdrive is a simulator that allows anyone with a PC to push the state-of-the-art in self-driving
Stars: ✭ 628 (+1694.29%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Pysc2 Examples
StarCraft II - pysc2 Deep Reinforcement Learning Examples
Stars: ✭ 722 (+1962.86%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Animalai Olympics
Code repository for the Animal AI Olympics competition
Stars: ✭ 544 (+1454.29%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Elegantrl
Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch.
Stars: ✭ 575 (+1542.86%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Deeprl Tutorials
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
Stars: ✭ 748 (+2037.14%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning
Pygame Learning Environment
PyGame Learning Environment (PLE) -- Reinforcement Learning Environment in Python.
Stars: ✭ 828 (+2265.71%)
Mutual labels:  reinforcement-learning, deep-reinforcement-learning

left-shift

A DQN agent reaching the 2048 tile.

This repository contains the code used in our project for the INF581: Advanced Topics in A.I. at École Polytechnique.

In this project, we aim at training a game-playing agent for the 2048 game. We implement an OpenAI Gym environment to model the game and use the Deep Q-Learning (DQN) algorithm from the Stable Baseline library to train multiple agents varying the states encoding, reward function, network type and structure. Results show that encoding states using one-hot encoding is crucial for better performance. We have also concluded that Convolutional Neural Networks (CNN) are more efficient than Multilayer Perceptrons (MLP) for the purpose of this game.

For a more in-depth discussion, give our report a read.

Project Structure

Below we detail the function of each directory:

  • agents: contains scripts to train and evaluate agents (more details on the Running subsection), as well as the necessary code implementing custom callbacks and policies;
  • docs: contains the GIF you saw above and the final report of the project;
  • hyperparams: contains YAML files detailing hyperparameters for agents (more details on the Hyperparameters subsection);
  • models: contains pretrained agents;
  • utils: contains auxiliary scripts for plotting results.

Instalation

We recommend using a separate Python 3.7 environment for this project (there is an incompatibility issue when trying to load models created using Python 3.7 on other versions). Our dependencies are:

A quick way to install them is to run the following command: pip install -r [requirements.txt|requirements-gpu.txt], choosing the appropriate file depending on whether you wish to run the models on a CPU or a GPU.

To install the environment, execute the following commands:

git clone https://github.com/mmcenta/gym-text2048
pip install -e gym-text2048

Running

Interactive player for humans:

python agents/play.py

Random agent:

python agents/random_agent.py

DQN:

To show an agent reaching 2048:

python agents/dqn.py --r2048

To demo pretrained models:

python agents/dqn.py -mn MODEL_NAME --demo
Example:
python agents/dqn.py -mn cnn_5l_4_v2 --demo

The models are in the folder models/. If the model has _nohot, you have to launch it with the --no-one-hot flag.

Example:
python agents/dqn.py -mn cnn_5l_4_v2_nohot --no-one-hot --demo

To train:

python agents/dqn.py -mn MODEL_NAME --train

To evaluate:

python agents/dqn.py -mn MODEL_NAME --eval

Complete usage:

python agents/dqn.py -h
usage: python agents/dqn.py [-h] [--env ENV] [--tensorboard-log TENSORBOARD_LOG]
              [--hyperparams-file HYPERPARAMS_FILE] [--model-name MODEL_NAME]
              [--n-timesteps N_TIMESTEPS] [--log-interval LOG_INTERVAL]
              [--hist-freq HIST_FREQ] [--eval-episodes EVAL_EPISODES]
              [--save-freq SAVE_FREQ] [--save-directory SAVE_DIRECTORY]
              [--log-directory LOG_DIRECTORY] [--seed SEED]
              [--verbose VERBOSE] [--no-one-hot] [--train] [--eval]
              [--extractor EXTRACTOR] [--demo] [--r2048]

optional arguments:
  -h, --help            show this help message and exit
  --env ENV             Environment id.
  --tensorboard-log TENSORBOARD_LOG, -tb TENSORBOARD_LOG
                        Tensorboard log directory.
  --hyperparams-file HYPERPARAMS_FILE, -hf HYPERPARAMS_FILE
                        Hyperparameter YAML file location.
  --model-name MODEL_NAME, -mn MODEL_NAME
                        Model name (if it already exists, training will be
                        resumed).
  --n-timesteps N_TIMESTEPS, -n N_TIMESTEPS
                        Number of timesteps.
  --log-interval LOG_INTERVAL
                        Log interval.
  --hist-freq HIST_FREQ
                        Dumps histogram each n steps.
  --eval-episodes EVAL_EPISODES
                        Number of episodes to use for evaluation.
  --save-freq SAVE_FREQ
                        Save the model every n steps (if negative, no
                        checkpoint).
  --save-directory SAVE_DIRECTORY, -sd SAVE_DIRECTORY
                        Save directory.
  --log-directory LOG_DIRECTORY, -ld LOG_DIRECTORY
                        Log directory.
  --seed SEED           Random generator seed.
  --verbose VERBOSE     Verbose mode (0: no output, 1: INFO).
  --no-one-hot          Disable one-hot encoding
  --train               Enable training
  --eval                Enable evaluation
  --extractor EXTRACTOR
                        Change extractor
  --demo                Enable rendering and runs for 1 episode
  --r2048               Show an agent reaching 2048

Plotting training logs:

To plot all the training logs:

python utils/plot_log_multi.py

To plot a specific training log:

python utils/plot_log.py PATH_TO_LOG

Example:

python utils/plot_log.py logs/cnn_5l4_fc.npz

Authors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].