Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

This is the PyTorch implementation for our paper:

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
British Machine Vision Conference (BMVC), 2019
Oral Presentation

Visit the main website for more details.

Reference

If you use our code for your research, please cite our paper (BMVC 2019 oral):

Bibtex:

@inproceedings{landi2019embodied,
      title={Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters},
      author={Landi, Federico and Baraldi, Lorenzo and Corsini, Massimiliano and Cucchiara, Rita},
      booktitle={Proceedings of the British Machine Vision Conference},
      year={2019}
    }

Installation

Clone Repo

Clone the repository:

# Make sure to clone with --recursive
git clone --recursive https://github.com/fdlandi/DynamicConv-agent.git
cd DynamicConv-agent

If you didn't clone with the --recursive flag, then you'll need to manually clone the pybind submodule from the top-level directory:

git submodule update --init --recursive

Python setup

Python 3.6 is required to run our code. You can install the other modules via:

cd speaksee
pip install -e .
cd ..
pip install -r requirements.txt

Building with Docker

Please follow the instructions on the Matterport3DSimulator to install the simulator via Docker.

Bulding without Docker

The simulator can be built outside of a docker container using the cmake build commands described above. However, this is not the recommended approach, as all dependencies will need to be installed locally and may conflict with existing libraries. The main requirements are:

Ubuntu >= 14.04
Nvidia-driver with CUDA installed
C++ compiler with C++11 support
CMake >= 3.10
OpenCV >= 2.4 including 3.x
OpenGL
GLM
Numpy

Optional dependences (depending on the cmake rendering options):

OSMesa for OSMesa backend support
epoxy for EGL backend support

Build and Test

Build the simulator and run the unit tests:

cd DynamicConv-agent
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make
cd ../
./build/tests ~Timing

If you use a conda environment for your experiments, you should specify the python path in the cmake options:

cmake -DEGL_RENDERING=ON -DPYTHON_EXECUTABLE:FILEPATH='path_to_your_python_bin' ..

Precomputing ResNet Image Features

Alternatively, skip the generation and just download and extract our tsv files into the img_features directory:

Training and Testing

You can train our agent by running:

python tasks/R2R/main.py

The number of dynamic filters can be set with the --num_heads parameter:

python tasks/R2R/main.py --num_heads=4

Reproducibility Note

Results in our paper were obtained with version v0.1 of the Matterport3DSimulator. Due to this difference, results could vary from the one in the paper. Using different GPUs for training, as well as different random seeds, may also affect results.

We provide the weights obtained with our training. To reproduce results from the paper, run:

python tasks/R2R/main.py --name=normal_data --num_heads=4 --eval_only

or:

python tasks/R2R/main.py --name=data_augmentation --num_heads=4 --eval_only

License

The Matterport3D dataset, and data derived from it, is released under the Matterport3D Terms of Use. Our code is released under the MIT license.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

aimagelab / DynamicConv-agent

Programming Languages