All Projects → jayleicn → Recurrent Transformer

jayleicn / Recurrent Transformer

Licence: mit
[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Projects that are alternatives of or similar to Recurrent Transformer

Airbnb Amenity Detection
Repo for 42 days project to replicate/improve Airbnb's amenity (object) detection pipeline.
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Nn From Scratch
Implementing a Neural Network from Scratch
Stars: ✭ 1,374 (+1233.98%)
Mutual labels:  jupyter-notebook
Object detection demo live
This is the code for the "How to do Object Detection with OpenCV" live session by Siraj Raval on Youtube
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Hass Deepstack Face
Home Assistant custom component for using Deepstack face recognition
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Cvnd localization exercises
Notebooks for learning about object motion and localization methods in the last section of CVND.
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Uncertaintynn
Implementation and evaluation of different approaches to get uncertainty in neural networks
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Unet
Generic U-Net Tensorflow 2 implementation for semantic segmentation
Stars: ✭ 100 (-2.91%)
Mutual labels:  jupyter-notebook
Pycebox
⬛ Python Individual Conditional Expectation Plot Toolbox
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Data Driven Discretization 1d
Code for "Learning data-driven discretizations for partial differential equations"
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Style Tranfer
Implementation of original style transfer paper (Gatys et al)
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Context aug
Context-driven data augmentation for Object Detection (ECCV'18)
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Spectralnormalizationkeras
Spectral Normalization for Keras Dense and Convolution Layers
Stars: ✭ 100 (-2.91%)
Mutual labels:  jupyter-notebook
Canalsandeco
Todas os arquivos dos vídeos do Canal Sandeco
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Mish Cuda
Mish Activation Function for PyTorch
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Codeinblogs
Stars: ✭ 100 (-2.91%)
Mutual labels:  jupyter-notebook
Codeinquarantine
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Traffic sign recognition efficient cnns
A repository for the paper "Real-Time Traffic Sign Recognition Based on Efficient CNNs in the Wild"
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook
Stylegan2 Tensorflow 2.x
Unofficial implementation of StyleGAN2 using TensorFlow 2.x.
Stars: ✭ 102 (-0.97%)
Mutual labels:  jupyter-notebook
Pythondataanalysiscookbook
Python Data Analysis Cookbook, published by Packt
Stars: ✭ 100 (-2.91%)
Mutual labels:  jupyter-notebook
Learning notebook
利用python进行财务分析
Stars: ✭ 101 (-1.94%)
Mutual labels:  jupyter-notebook

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

PyTorch code for our ACL 2020 paper "MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning" by Jie Lei, Liwei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, and Mohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses a memory module to augment the transformer architecture. The memory module generates a highly summarized memory state from the video segments and the sentence history so as to help better prediction of the next sentence (w.r.t. coreference and repetition aspects), thus encouraging coherent paragraph generation. Extensive experiments, human evaluations, and qualitative analyses on two popular datasets ActivityNet Captions and YouCookII show that MART generates more coherent and less repetitive paragraph captions than baseline methods, while maintaining relevance to the input video events.

Related works:

Getting started

Prerequisites

  1. Clone this repository
# no need to add --recursive as all dependencies are copied into this repo.
git clone https://github.com/jayleicn/recurrent-transformer.git
cd recurrent-transformer
  1. Prepare feature files

Download features from Google Drive: rt_anet_feat.tar.gz (39GB) and rt_yc2_feat.tar.gz (12GB). These features are repacked from features provided by densecap.

mkdir video_feature && cd video_feature
tar -xf path/to/rt_anet_feat.tar.gz 
tar -xf path/to/rt_yc2_feat.tar.gz 
  1. Install dependencies
  • Python 2.7
  • PyTorch 1.1.0
  • nltk
  • easydict
  • tqdm
  • tensorboardX
  1. Add project root to PYTHONPATH
source setup.sh

Note that you need to do this each time you start a new session.

Training and Inference

We give examples on how to perform training and inference with MART.

  1. Build Vocabulary
bash scripts/build_vocab.sh DATASET_NAME

DATASET_NAME can be anet for ActivityNet Captions or yc2 for YouCookII.

  1. MART training

The general training command is:

bash scripts/train.sh DATASET_NAME MODEL_TYPE

MODEL_TYPE can be one of [mart, xl, xlrg, mtrans, mart_no_recurrence], see details below.

MODEL_TYPE Description
mart Memory Augmented Recurrent Transformer
xl Transformer-XL
xlrg Transformer-XL with recurrent gradient
mtrans Vanilla Transformer
mart_no_recurrence mart with recurrence disabled

To train our MART model on ActivityNet Captions:

bash scripts/train.sh anet mart

Training log and model will be saved at results/anet_re_*.
Once you have a trained model, you can follow the instructions below to generate captions.

  1. Generate captions
bash scripts/translate_greedy.sh anet_re_* val

Replace anet_re_* with your own model directory name. The generated captions are saved at results/anet_re_*/greedy_pred_val.json

  1. Evaluate generated captions
bash scripts/eval.sh anet val results/anet_re_*/greedy_pred_val.json

The results should be comparable with the results we present at Table 2 of the paper. E.g., [email protected] 10.33; [email protected] 5.18.

Citations

If you find this code useful for your research, please cite our paper:

@inproceedings{lei2020mart,
  title={MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning},
  author={Lei, Jie and Wang, Liwei and Shen, Yelong and Yu, Dong and Berg, Tamara L and Bansal, Mohit},
  booktitle={ACL},
  year={2020}
}

Others

This code used resources from the following projects: transformers, transformer-xl, densecap, OpenNMT-py.

Contact

jielei [at] cs.unc.edu

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].