All Projects → tuyunbin → Video-Description-with-Spatial-Temporal-Attention

tuyunbin / Video-Description-with-Spatial-Temporal-Attention

Licence: other
[ACM MM 2017 & IEEE TMM 2020] This is the Theano code for the paper "Video Description with Spatial Temporal Attention"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Video-Description-with-Spatial-Temporal-Attention

Guided Attention Inference Network
Contains implementation of Guided Attention Inference Network (GAIN) presented in Tell Me Where to Look(CVPR 2018). This repository aims to apply GAIN on fcn8 architecture used for segmentation.
Stars: ✭ 204 (+284.91%)
Mutual labels:  attention-mechanism
Self Attention Cv
Implementation of various self-attention mechanisms focused on computer vision. Ongoing repository.
Stars: ✭ 209 (+294.34%)
Mutual labels:  attention-mechanism
Transformers-RL
An easy PyTorch implementation of "Stabilizing Transformers for Reinforcement Learning"
Stars: ✭ 107 (+101.89%)
Mutual labels:  attention-mechanism
Keras Attention Mechanism
Attention mechanism Implementation for Keras.
Stars: ✭ 2,504 (+4624.53%)
Mutual labels:  attention-mechanism
X Transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
Stars: ✭ 211 (+298.11%)
Mutual labels:  attention-mechanism
Aoanet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
Stars: ✭ 242 (+356.6%)
Mutual labels:  attention-mechanism
Point Transformer Pytorch
Implementation of the Point Transformer layer, in Pytorch
Stars: ✭ 199 (+275.47%)
Mutual labels:  attention-mechanism
Im2LaTeX
An implementation of the Show, Attend and Tell paper in Tensorflow, for the OpenAI Im2LaTeX suggested problem
Stars: ✭ 16 (-69.81%)
Mutual labels:  attention-mechanism
Triplet Attention
Official PyTorch Implementation for "Rotate to Attend: Convolutional Triplet Attention Module." [WACV 2021]
Stars: ✭ 222 (+318.87%)
Mutual labels:  attention-mechanism
DARNN
A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Stars: ✭ 90 (+69.81%)
Mutual labels:  attention-mechanism
Neat Vision
Neat (Neural Attention) Vision, is a visualization tool for the attention mechanisms of deep-learning models for Natural Language Processing (NLP) tasks. (framework-agnostic)
Stars: ✭ 213 (+301.89%)
Mutual labels:  attention-mechanism
Dalle Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Stars: ✭ 3,661 (+6807.55%)
Mutual labels:  attention-mechanism
Attentionalpoolingaction
Code/Model release for NIPS 2017 paper "Attentional Pooling for Action Recognition"
Stars: ✭ 248 (+367.92%)
Mutual labels:  attention-mechanism
Linear Attention Transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
Stars: ✭ 205 (+286.79%)
Mutual labels:  attention-mechanism
question-generation
Neural Models for Key Phrase Detection and Question Generation
Stars: ✭ 29 (-45.28%)
Mutual labels:  attention-mechanism
Attention Mechanisms
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Stars: ✭ 203 (+283.02%)
Mutual labels:  attention-mechanism
Linformer Pytorch
My take on a practical implementation of Linformer for Pytorch.
Stars: ✭ 239 (+350.94%)
Mutual labels:  attention-mechanism
SA-DL
Sentiment Analysis with Deep Learning models. Implemented with Tensorflow and Keras.
Stars: ✭ 35 (-33.96%)
Mutual labels:  attention-mechanism
TianChi AIEarth
TianChi AIEarth Contest Solution
Stars: ✭ 57 (+7.55%)
Mutual labels:  attention-mechanism
lstm-attention
Attention-based bidirectional LSTM for Classification Task (ICASSP)
Stars: ✭ 87 (+64.15%)
Mutual labels:  attention-mechanism

Video-Description-with-Spatial-Temporal-Attention

This package contains the accompanying code for the following paper:

Tu, Yunbin, et al. "Video Description with Spatial-Temporal Attention.", and "Baidu Cloud", which has appeared as full paper in the Proceedings of the ACM International Conference on Multimedia,2017 (ACM MM'17).

The codes are forked from yaoli/arctic-capgen-vid.

We illustrate the training details as follows:

usage

Installation

Firstly, Clone our repository:

$ git clone https://github.com/tuyunbin/Video-Description-with-Spatial-Temporal-Attention.git

Here, msvd_data contains 7 pkl files needed to train and test the model.

Dependencies

Theano can be easily installed by following the instructions there. Theano has its own dependencies as well. The second way to install Theano is to install Anaconda. If you use first way to install Theano, you may meet the error : "no module named pygpu". If so, you should install it with Anaconda, but you needn't change your python environment. You only add this command when you use Theano:

$ export PATH="/home/tuyunbin/anaconda2/bin:$PATH"

(Changing your own PATH)

coco-caption. Install it by simply adding it into your $PYTHONPATH.

Jobman. After it has been git cloned, please add it into $PYTHONPATH as well.

Finally, you will also need to install h5py, since we will use hdf5 files to store the preprocessed features.

Video Datas and Pre-extracted Features on MSVD Dataset.

The pre-processed datasets used in our paper are available at this links, and there is the baidu cloud link.

The pre-processed global, motion and local features used in our paper can be download at these links:

global features, and there is the baidu cloud link.

motion features, and there is the baidu cloud link.

local features extracting code is: h7nq. And this is google driver link.

In our paper, we used local features extracted from the fc7 layer of Faster R-CNN network, and their number is 8. You can extract local features with other number by Faster R-CNN.

Note: Since the data amount on MSR-VTT-10K is too large, we don't offer the data we used. You can train your model on this dataset with the same code. But don't forget to shuffle the train_id when training the model.

Test model trained by us

Firstly, you need to download the pre-trained model at this link, this is google driver link, and as add them into your $PYTHONPATH.

Secondly, go to common.py and change the following two line

RAB_DATASET_BASE_PATH = '/home/tuyunbin/Video-Description-with-Spatial-Temporal-Attention/msvd_data/' 
RAB_EXP_PATH = '/home/sdc/tuyunbin/msvd_result/Video-Description-with-Spatial-Temporal-Attention/exp/' 

according to your specific setup. The first path is the parent dir path containing msvd_data folder. The second path specifies where you would like to save all the experimental results. Before testing the model, we suggest to test data_engine.py by running python data_engine.py without any error. It is also useful to verify coco-caption evaluation pipeline works properly by running python metrics.py without any error.

Finally, you can exploit our trained model by setting this configuration with 'True' in config.py.

'reload_': True,

Train your own model

Here, you need to set 'False' with reload in config.py.

Now ready to launch the training

$ THEANO_FLAGS=mode=FAST_RUN,device=cuda0,floatX=float32 python train_model.py

If you find this helps your research, please consider citing:

@inproceedings{tu2017video,
  title={Video Description with Spatial-Temporal Attention},
  author={Tu, Yunbin and Zhang, Xishan and Liu, Bingtao and Yan, Chenggang},
  booktitle={Proceedings of the 2017 ACM on Multimedia Conference},
  pages={1014--1022},
  year={2017},
  organization={ACM}
}

@ARTICLE{8744407,  
author={C. {Yan} and Y. {Tu} and X. {Wang} and Y. {Zhang} and X. {Hao} and Y. {Zhang} and Q. {Dai}},  
journal={IEEE Transactions on Multimedia},   
title={STAT: Spatial-Temporal Attention Mechanism for Video Captioning},   
year={2020},  
volume={22},  
number={1},  
pages={229-241},}

Notes

Running train_model.py for the first time takes much longer since Theano needs to compile for the first time lots of things and cache on disk for the future runs. You will probably see some warning messages on stdout. It is safe to ignore all of them. Both model parameters and configurations are saved (the saving path is printed out on stdout, easy to find). The most important thing to monitor is train_valid_test.txt in the exp output folder. It is a big table saving all metrics per validation.

Contact

My email is [email protected]

Any discussions and suggestions are welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].