Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A binary release of trained deep reinforcement learning models trained in the Atari machine learning benchmark, and a software release that enables easy visualization and analysis of models, and comparison across training algorithms.

Stars: ✭ 198 (+219.35%)

Mutual labels: ai, research

Allenact

An open source framework for research in Embodied-AI from AI2.

Stars: ✭ 144 (+132.26%)

Mutual labels: ai, research

Airsim

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Stars: ✭ 12,528 (+20106.45%)

Mutual labels: ai, research

Gibsonenv

Gibson Environments: Real-World Perception for Embodied Agents

Stars: ✭ 666 (+974.19%)

Mutual labels: robotics, research

Redtail

Perception and AI components for autonomous mobile robotics.

Stars: ✭ 832 (+1241.94%)

Mutual labels: ai, robotics

View All Similar Projects ➔

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

This repository is the official implementation of Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments. [project website]

Vision and Language Navigation in Continuous Environments (VLN-CE) is an instruction-guided navigation task with crowdsourced instructions, realistic environments, and unconstrained agent navigation. This repo is a launching point for interacting with the VLN-CE task and provides a wide array of baseline agents including a Seq2Seq model and a Cross-Modal Attention model. Models can be trained via two imitation learning methods: teacher forcing (behavior cloning) and DAgger. VLN-CE is implemented on top of the Habitat platform.

Setup

This project is developed with Python 3.6. If you are using miniconda or anaconda, you can create an environment:

conda create -n vlnce python3.6
conda activate vlnce

Habitat and Other Dependencies

VLN-CE makes extensive use of the Habitat Simulator and Habitat-Lab developed by FAIR. You will first need to install both Habitat-Sim and Habitat-Lab. If you are using conda, Habitat-Sim can easily be installed with:

conda install -c aihabitat -c conda-forge habitat-sim headless

Otherwise, follow the Habitat-Sim installation instructions. Then install Habitat-Lab version 0.1.5:

git clone --branch v0.1.5 [email protected]:facebookresearch/habitat-lab.git
cd habitat-lab
# installs both habitat and habitat_baselines
python -m pip install -r requirements.txt
python -m pip install -r habitat_baselines/rl/requirements.txt
python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
python setup.py develop --all

We recommend downloading the test scenes and running the example script as described here to ensure the installation of Habitat-Sim and Habitat-Lab was successful. Now you can clone this repository and install the rest of the dependencies:

git clone [email protected]:jacobkrantz/VLN-CE.git
cd VLN-CE
python -m pip install -r requirements.txt

Data

Like Habitat-Lab, we expect a data folder (or symlink) with a particular structure in the top-level directory of this project.

Matterport3D

We train and evaluate our agents on Matterport3D (MP3D) scene reconstructions. The official Matterport3D download script (download_mp.py) can be accessed by following the "Dataset Download" instructions on their project webpage. The scene data needed to run VLN-CE can then be downloaded this way:

# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/

Extract this data to data/scene_datasets/mp3d such that it has the form data/scene_datasets/mp3d/{scene}/{scene}.glb. There should be 90 total scenes.

Dataset

The R2R_VLNCE dataset is a port of the Room-to-Room (R2R) dataset created by Anderson et al for use with the Matterport3DSimulator (MP3D-Sim). For details on the porting process from MP3D-Sim to the continuous reconstructions used in Habitat, please see our paper. We provide two versions of the dataset, R2R_VLNCE_v1-2 and R2R_VLNCE_v1-2_preprocessed. R2R_VLNCE_v1-2 contains the train, val_seen, val_unseen, and test splits. R2R_VLNCE_v1-2_preprocessed runs with our models out of the box. It additionally includes instruction tokens mapped to GloVe embeddings, ground truth trajectories, and a data augmentation split (envdrop) that is ported from R2R-EnvDrop. The test split does not contain episode goals or ground truth paths. See here for using the test split. For more details on the dataset contents and format, see our project page.

Dataset	Extract path	Size
R2R_VLNCE_v1-2.zip	`data/datasets/R2R_VLNCE_v1-2`	3 MB
R2R_VLNCE_v1-2_preprocessed.zip	`data/datasets/R2R_VLNCE_v1-2_preprocessed`	345 MB

Downloading the dataset:

python -m pip install gdown
cd data/datasets

# R2R_VLNCE_v1-2
gdown https://drive.google.com/uc?id=1YDNWsauKel0ht7cx15_d9QnM6rS4dKUV
unzip R2R_VLNCE_v1-2.zip
rm R2R_VLNCE_v1-2.zip

# R2R_VLNCE_v1-2_preprocessed
gdown https://drive.google.com/uc?id=18sS9c2aRu2EAL4c7FyG29LDAm2pHzeqQ
unzip R2R_VLNCE_v1-2_preprocessed.zip
rm R2R_VLNCE_v1-2_preprocessed.zip

Encoder Weights

The learning-based models receive a depth observation at each time step. The depth encoder we use is a ResNet pretrained on a PointGoal navigation task using DDPPO. In this work, we extract features from the ResNet50 trained on Gibson 2+ from the original paper, whose weights can be downloaded here (672M). Extract the contents of ddppo-models.zip to data/ddppo-models/{model}.pth.

Usage

The run.py script is how training and evaluation is done for all model configurations. Specify a configuration file and a run type as such:

python run.py \
  --exp-config path/to/experiment_config.yaml \
  --run-type {train | eval | inference}

For example, a random agent can be evaluated on 10 val-seen episodes using this command:

python run.py --exp-config vlnce_baselines/config/nonlearning.yaml --run-type eval

For lists of modifiable configuration options, see the default task config and experiment config files.

Imitation Learning

For both teacher forcing and DAgger training, experience is collected in simulation and saved to disc for future network updates. This includes saving (at each time step along a trajectory) RGB and Depth encodings, ground truth actions, and instruction tokens. The DAGGER config entry allows for specifying which training type is used. A teacher forcing example:

DAGGER:
  LR: 2.5e-4  # learning rate
  ITERATIONS: 1  # set to 1 for teacher forcing
  EPOCHS: 15
  UPDATE_SIZE: 10819  # total number of training episodes
  BATCH_SIZE: 5  # number of complete episodes in a batch
  P: 1.0  # Must be 1.0 for teacher forcing
  USE_IW: True  # Inflection weighting

A DAgger example:

DAGGER:
  LR: 2.5e-4  # learning rate
  ITERATIONS: 15  # number of dataset aggregation rounds
  EPOCHS: 4  # number of network update rounds per iteration
  UPDATE_SIZE: 5000  # total number of training episodes
  BATCH_SIZE: 5  # number of complete episodes in a batch
  P: 0.75  # DAgger: 0.0 < P < 1.0
  USE_IW: True  # Inflection weighting

Configuration options exist for loading an already-trained checkpoint for fine-tuning (LOAD_FROM_CKPT, CKPT_TO_LOAD) as well as for reusing a database of collected features (PRELOAD_LMDB_FEATURES, LMDB_FEATURES_DIR). Note that reusing collected features for training only makes sense for regular teacher forcing training.

Evaluating Models

Evaluation of models can be done by running python run.py --exp-config path/to/experiment_config.yaml --run-type eval. The relevant config entries for evaluation are:

EVAL_CKPT_PATH_DIR  # path to a checkpoint or a directory of checkpoints

EVAL.USE_CKPT_CONFIG  # if True, use the config saved in the checkpoint file
EVAL.SPLIT  # which dataset split to evaluate on (typically val_seen or val_unseen)
EVAL.EPISODE_COUNT  # how many episodes to evaluate

If EVAL.EPISODE_COUNT is equal to or greater than the number of episodes in the evaluation dataset, all episodes will be evaluated. If EVAL_CKPT_PATH_DIR is a directory, one checkpoint will be evaluated at a time. If there are no more checkpoints to evaluate, the script will poll the directory every few seconds looking for a new one. Each config file listed in the next section is capable of both training and evaluating the model it is accompanied by.

Cuda

Cuda will be used by default if it is available. If you have multiple GPUs, you can specify which card is used:

SIMULATOR_GPU_ID: 0
TORCH_GPU_ID: 0
NUM_PROCESSES: 1

Note that the simulator and torch code do not need to run on the same card. For faster training and evaluation, we recommend running with as many processes (parallel simulations) as will fit on a standard GPU.

Models and Results From the Paper

The baseline model for the VLN-CE task is the cross-modal attention model trained with progress monitoring, DAgger, and augmented data (CMA_PM_DA_Aug). As evaluated on the leaderboard, this model achieves:

Split	TL	NE	OS	SR	SPL
Test	8.85	7.91	0.36	0.28	0.25
Val Unseen	8.27	7.60	0.36	0.29	0.27
Val Seen	9.06	7.21	0.44	0.34	0.32

Experiments from the paper can be ran by following the configs in the table below:

Model	val_seen SPL	val_unseen SPL	Config
Seq2Seq	0.24	0.18	seq2seq.yaml
Seq2Seq_PM	0.21	0.15	seq2seq_pm.yaml
Seq2Seq_DA	0.32	0.23	seq2seq_da.yaml
Seq2Seq_Aug	0.25	0.17	seq2seq_aug.yaml ⟶ seq2seq_aug_tune.yaml
Seq2Seq_PM_DA_Aug	0.31	0.22	seq2seq_pm_aug.yaml ⟶ seq2seq_pm_da_aug_tune.yaml
CMA	0.25	0.22	cma.yaml
CMA_PM	0.26	0.19	cma_pm.yaml
CMA_DA	0.31	0.25	cma_da.yaml
CMA_Aug	0.24	0.19	cma_aug.yaml ⟶ cma_aug_tune.yaml
CMA_PM_DA_Aug	0.35	0.30	cma_pm_aug.yaml ⟶ cma_pm_da_aug_tune.yaml
CMA_PM_Aug	0.25	0.22	cma_pm_aug.yaml ⟶ cma_pm_aug_tune.yaml
CMA_DA_Aug	0.33	0.26	cma_aug.yaml ⟶ cma_da_aug_tune.yaml

	Legend
Seq2Seq	Sequence-to-Sequence baseline model
CMA	Cross-Modal Attention model
PM	Progress monitor
DA	DAgger training (otherwise teacher forcing)
Aug	Uses the EnvDrop episodes to augment the training set
⟶	Use the config on the left to train the model. Evaluate each checkpoint on `val_unseen`. The best checkpoint (according to `val_unseen` SPL) is then fine-tuned using the config on the right. Make sure to update the field `DAGGER.CKPT_TO_LOAD` before fine-tuning.

Published Results vs Leaderboard

The CMA_PM_DA_Aug model was originally presented with a val_unseen performance of 0.30 SPL, however the leaderboard evaluates this same model on val_unseen at 0.27 SPL. This model was originally trained and evaluated on a hardware + Habitat build that gave slightly different results, as is the case for the other paper experiments. Going forward, the leaderboard contains the performance metrics that should be used for official comparison. Official validation performance of CMA_PM_DA_Aug is in the table at the top of the results section. In our tests, the installation procedure for this repo gives nearly identical evaluation to the leaderboard, but we recognize that compute hardware along with the version and build of the Habitat Simulator are factors to reproducibility.

Pretrained Models

We provide pretrained models for our best Seq2Seq model Seq2Seq_DA and Cross-Modal Attention model (CMA_PM_DA_Aug). These models are hosted on Google Drive and can be downloaded as such:

python -m pip install gdown

# CMA_PM_DA_Aug (141MB)
gdown https://drive.google.com/uc?id=199hhL9M0yiurB3Hb_-DrpMRxWP1lSGX3
# Seq2Seq_DA (135MB)
gdown https://drive.google.com/uc?id=1gds-t8LAxuh236gk-5AWU0LzDg9rJmQS

VLN-CE Leaderboard on EvalAI

The VLN-CE leaderboard is now live and taking submissions for public test set evaluation. For challenge guidelines, please visit the leaderboard webpage.

To submit to the leaderboard, you must run your agent locally and submit a JSON file containing the generated agent trajectories. Starter code for generating this JSON file is provided in the function DaggerTrainer.inference(). Here is an example of generating this file using the pretrained Cross-Modal Attention baseline:

python run.py \
  --exp-config vlnce_baselines/config/paper_configs/test_set_inference.yaml \
  --run-type inference

Relevant experiment configurations include:

INFERENCE:
  SPLIT: test  # dataset split to generate predictions for: {val_seen | val_unseen | test}
  CKPT_PATH: data/checkpoints/CMA_PM_DA_Aug.pth  # checkpoint of your trained model
  PREDICTIONS_FILE: predictions.json  # where to save your agent's generated trajectories
  USE_CKPT_CONFIG: False
  INFERENCE_NONLEARNING: False
  NONLEARNING:
    AGENT: RandomAgent  # if INFERENCE_NONLEARNING, specify an agent class

It is important that your predictions file is in the proper format — please see the challenge webpage for specification.

Contributing

This codebase is under the MIT license. If you find something wrong or have a question, feel free to open an issue. If you would like to contribute, please install pre-commit before making commits in a pull request:

python -m pip install pre-commit
pre-commit install

Citing

If you use VLN-CE in your research, please cite the following paper:

@inproceedings{krantz_vlnce_2020,
  title={Beyond the Nav-Graph: Vision and Language Navigation in Continuous Environments},
  author={Jacob Krantz and Erik Wijmans and Arjun Majundar and Dhruv Batra and Stefan Lee},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2020}
 }

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 62

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗