Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → devendrachaplot → Deeprl Grounding

devendrachaplot / Deeprl Grounding

Licence: mit

Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch reinforcement-learning

Projects that are alternatives of or similar to Deeprl Grounding

Rl Tutorial Jnrr19

Stable-Baselines tutorial for Journées Nationales de la Recherche en Robotique 2019

Stars: ✭ 204 (-9.73%)

Mutual labels: reinforcement-learning

Classic papers and resources on recommendation

Stars: ✭ 2,804 (+1140.71%)

Mutual labels: reinforcement-learning

Bayesian Reinforcement Learning in Tensorflow

Stars: ✭ 222 (-1.77%)

Mutual labels: reinforcement-learning

Pytorch A2c Ppo Acktr Gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Stars: ✭ 2,632 (+1064.6%)

Mutual labels: reinforcement-learning

Pytorch Reinforce

PyTorch Implementation of REINFORCE for both discrete & continuous control

Stars: ✭ 212 (-6.19%)

Mutual labels: reinforcement-learning

Framework and OpenAI Gym Environment for Autonomous Vehicle Development

Stars: ✭ 214 (-5.31%)

Mutual labels: reinforcement-learning

Meandering In Networks of Entities to Reach Verisimilar Answers

Stars: ✭ 205 (-9.29%)

Mutual labels: reinforcement-learning

Accelerated deep learning R&D

Stars: ✭ 2,804 (+1140.71%)

Mutual labels: reinforcement-learning

Awesome Deeplearning Resources

Deep Learning and deep reinforcement learning research papers and some codes

Stars: ✭ 2,483 (+998.67%)

Mutual labels: reinforcement-learning

ns3-gym - The Playground for Reinforcement Learning in Networking Research

Stars: ✭ 221 (-2.21%)

Mutual labels: reinforcement-learning

Alphazero gomoku

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Stars: ✭ 2,570 (+1037.17%)

Mutual labels: reinforcement-learning

Reinforcement Learning An Introduction Chinese

《Reinforcement Learning: An Introduction》（第二版）中文翻译

Stars: ✭ 210 (-7.08%)

Mutual labels: reinforcement-learning

Input Convex Neural Networks

Stars: ✭ 214 (-5.31%)

Mutual labels: reinforcement-learning

中国象棋alpha zero程序

Stars: ✭ 206 (-8.85%)

Mutual labels: reinforcement-learning

Machine Learning Notebooks

Machine Learning notebooks for refreshing concepts.

Stars: ✭ 222 (-1.77%)

Mutual labels: reinforcement-learning

An environment to high-frequency trading agents under reinforcement learning

Stars: ✭ 205 (-9.29%)

Mutual labels: reinforcement-learning

Framework for Multi-Agent Deep Reinforcement Learning in Poker

Stars: ✭ 214 (-5.31%)

Mutual labels: reinforcement-learning

A collection of multi agent environments based on OpenAI gym.

Stars: ✭ 226 (+0%)

Mutual labels: reinforcement-learning

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments

Stars: ✭ 225 (-0.44%)

Mutual labels: reinforcement-learning

Reinforcement Learning in Go

Stars: ✭ 215 (-4.87%)

Mutual labels: reinforcement-learning

View All Similar Projects ➔

Gated-Attention Architectures for Task-Oriented Language Grounding

This is a PyTorch implementation of the AAAI-18 paper:

Gated-Attention Architectures for Task-Oriented Language Grounding
Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, Ruslan Salakhutdinov
Carnegie Mellon University

Project Website: https://sites.google.com/view/gated-attention

This repository contains:

Code for training an A3C-LSTM agent using Gated-Attention
Code for Doom-based language grounding environment

Dependencies

ViZDoom
PyTorch
Opencv

(We recommend using Anaconda)

Usage

Using the Environment

For running a random agent:

python env_test.py

To play in the environment:

python env_test.py --interactive 1

To change the difficulty of the environment (easy/medium/hard):

python env_test.py -d easy

Training Gated-Attention A3C-LSTM agent

For training a A3C-LSTM agent with 32 threads:

python a3c_main.py --num-processes 32 --evaluate 0

The code will save the best model at ./saved/model_best.

To the test the pre-trained model for Multitask Generalization:

python a3c_main.py --evaluate 1 --load saved/pretrained_model

To the test the pre-trained model for Zero-shot Task Generalization:

python a3c_main.py --evaluate 2 --load saved/pretrained_model

To the visualize the model while testing add '--visualize 1':

python a3c_main.py --evaluate 2 --load saved/pretrained_model --visualize 1

To test the trained model, use --load saved/model_best in the above commands.

All arguments for a3c_main.py:

  -h, --help            show this help message and exit
  -l MAX_EPISODE_LENGTH, --max-episode-length MAX_EPISODE_LENGTH
                        maximum length of an episode (default: 30)
  -d DIFFICULTY, --difficulty DIFFICULTY
                        Difficulty of the environment, "easy", "medium" or
                        "hard" (default: hard)
  --living-reward LIVING_REWARD
                        Default reward at each time step (default: 0, change
                        to -0.005 to encourage shorter paths)
  --frame-width FRAME_WIDTH
                        Frame width (default: 300)
  --frame-height FRAME_HEIGHT
                        Frame height (default: 168)
  -v VISUALIZE, --visualize VISUALIZE
                        Visualize the envrionment (default: 0, use 0 for
                        faster training)
  --sleep SLEEP         Sleep between frames for better visualization
                        (default: 0)
  --scenario-path SCENARIO_PATH
                        Doom scenario file to load (default: maps/room.wad)
  --interactive INTERACTIVE
                        Interactive mode enables human to play (default: 0)
  --all-instr-file ALL_INSTR_FILE
                        All instructions file (default:
                        data/instructions_all.json)
  --train-instr-file TRAIN_INSTR_FILE
                        Train instructions file (default:
                        data/instructions_train.json)
  --test-instr-file TEST_INSTR_FILE
                        Test instructions file (default:
                        data/instructions_test.json)
  --object-size-file OBJECT_SIZE_FILE
                        Object size file (default: data/object_sizes.txt)
  --lr LR               learning rate (default: 0.001)
  --gamma G             discount factor for rewards (default: 0.99)
  --tau T               parameter for GAE (default: 1.00)
  --seed S              random seed (default: 1)
  -n N, --num-processes N
                        how many training processes to use (default: 4)
  --num-steps NS        number of forward steps in A3C (default: 20)
  --load LOAD           model path to load, 0 to not reload (default: 0)
  -e EVALUATE, --evaluate EVALUATE
                        0:Train, 1:Evaluate MultiTask Generalization
                        2:Evaluate Zero-shot Generalization (default: 0)
  --dump-location DUMP_LOCATION
                        path to dump models and log (default: ./saved/)

Demostration videos:

Multitask Generalization video: https://www.youtube.com/watch?v=YJG8fwkv7gA

Zero-shot Task Generalization video: https://www.youtube.com/watch?v=JziCKsLrudE

Different stages of training: https://www.youtube.com/watch?v=o_G6was03N0

Cite as

Chaplot, D.S., Sathyendra, K.M., Pasumarthi, R.K., Rajagopal, D. and Salakhutdinov, R., 2017. Gated-Attention Architectures for Task-Oriented Language Grounding. arXiv preprint arXiv:1706.07230. (PDF)

Bibtex:

@article{chaplot2017gated,
  title={Gated-Attention Architectures for Task-Oriented Language Grounding},
  author={Chaplot, Devendra Singh and Sathyendra, Kanthashree Mysore and Pasumarthi, Rama Kumar and Rajagopal, Dheeraj and Salakhutdinov, Ruslan},
  journal={arXiv preprint arXiv:1706.07230},
  year={2017}
}

Acknowledgements

This repository uses ViZDoom API (https://github.com/mwydmuch/ViZDoom) and parts of the code from the API. The implementation of A3C is borrowed from https://github.com/ikostrikov/pytorch-a3c. The poisson-disc code is borrowed from https://github.com/IHautaI/poisson-disc.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 226

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗