All Projects → AdrianHsu → S2VT-seq2seq-video-captioning-attention

AdrianHsu / S2VT-seq2seq-video-captioning-attention

Licence: other
S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to S2VT-seq2seq-video-captioning-attention

minimal-nmt
A minimal nmt example to serve as an seq2seq+attention reference.
Stars: ✭ 36 (+100%)
Mutual labels:  seq2seq, attention-mechanism
MoChA-pytorch
PyTorch Implementation of "Monotonic Chunkwise Attention" (ICLR 2018)
Stars: ✭ 65 (+261.11%)
Mutual labels:  seq2seq, attention-mechanism
A-Persona-Based-Neural-Conversation-Model
No description or website provided.
Stars: ✭ 22 (+22.22%)
Mutual labels:  seq2seq, attention-mechanism
AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: ✭ 33 (+83.33%)
Mutual labels:  attention-mechanism, captioning
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+5400%)
Mutual labels:  seq2seq, attention-mechanism
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (+27.78%)
Mutual labels:  seq2seq, attention-mechanism
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+777.78%)
Mutual labels:  seq2seq, attention-mechanism
SequenceToSequence
A seq2seq with attention dialogue/MT model implemented by TensorFlow.
Stars: ✭ 11 (-38.89%)
Mutual labels:  seq2seq, attention-mechanism
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+2166.67%)
Mutual labels:  seq2seq, attention-mechanism
Seq2seq chatbot
基于seq2seq模型的简单对话系统的tf实现,具有embedding、attention、beam_search等功能,数据集是Cornell Movie Dialogs
Stars: ✭ 308 (+1611.11%)
Mutual labels:  seq2seq, attention-mechanism
Video-Cap
🎬 Video Captioning: ICCV '15 paper implementation
Stars: ✭ 44 (+144.44%)
Mutual labels:  seq2seq, attention-mechanism
Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+11483.33%)
Mutual labels:  seq2seq, attention-mechanism
Seq2seq Summarizer
Pointer-generator reinforced seq2seq summarization in PyTorch
Stars: ✭ 306 (+1600%)
Mutual labels:  seq2seq, attention-mechanism
Seq2seq chatbot new
基于seq2seq模型的简单对话系统的tf实现,具有embedding、attention、beam_search等功能,数据集是Cornell Movie Dialogs
Stars: ✭ 144 (+700%)
Mutual labels:  seq2seq, attention-mechanism
Poetry Seq2seq
Chinese Poetry Generation
Stars: ✭ 159 (+783.33%)
Mutual labels:  seq2seq, attention-mechanism
Optic-Disc-Unet
Attention Unet model with post process for retina optic disc segmention
Stars: ✭ 77 (+327.78%)
Mutual labels:  attention-mechanism
uniformer-pytorch
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, debuted in ICLR 2022
Stars: ✭ 90 (+400%)
Mutual labels:  attention-mechanism
NARRE
This is our implementation of NARRE:Neural Attentional Regression with Review-level Explanations
Stars: ✭ 100 (+455.56%)
Mutual labels:  attention-mechanism
Transformer Temporal Tagger
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
Stars: ✭ 55 (+205.56%)
Mutual labels:  seq2seq
convolutional seq2seq
fairseq: Convolutional Sequence to Sequence Learning (Gehring et al. 2017) by Chainer
Stars: ✭ 63 (+250%)
Mutual labels:  seq2seq

S2VT

S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow

Based on the open source project written by cehnxinpeng. You can access the original version here: https://github.com/chenxinpeng/S2VT .

To access the original paper, [1] S. Venugopalan, M. Rohrbach, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence video to text. In Proc. ICCV, 2015, please click the link : http://www.cs.utexas.edu/users/ml/papers/venugopalan.iccv15.pdf

Model Structure

Based on the original paper, we take video frames as input in the encoding stage, and after that, in decoding stage, we feed the decoder output ( A, man, is, …) to concat with the red-color LSTM output.

The model structure of our code are shown below. In brief, no only did we implement the original paper, but also we add some extra features like Bahdanau Attention ( & Luong Attention ), Schedule Sampling and frame/word embedding in our implementation.

Experiment Setup

Global Parameters:

n_inputs        = 4096
n_hidden        = 600
val_batch_size  = 100 # validation batch size
n_frames        = 80 # .npy (80, 4096) 
max_caption_len = 50
forget_bias_red = 1.0
forget_bias_gre = 1.0
dropout_prob    = 0.5

Changeable Parameters: (argparse)

learning_rate   = 1e-4
num_epochs      = 100
batch_size      = 250
load_saver      = False # you can use pretrained model if you want
with_attention  = True
data_dir        = '.'
test_dir   = './testing_data'

Version

  • Python 3.6.0
  • Tensorflow 1.6.0

Required Packages

You need to pip3 install these packages below to run this code.

import tensorflow as tf # keras.preprocessing included
import numpy as np
import pandas as pd
import argparse
import pickle
from colors import *
from tqdm import *

Best BLEU Score

With bahdanau attention, we achieved 0.72434 for BLEU Score. The block shown below indicates the end of training status andBLEU_eval.py output message. You can check the sample output in output.txt.

Epoch 99, step 95/96, (Training Loss: 2.0834, samp_prob: 0.1235) [4:07:06<00:00, 148.26s/it]

How-to play

  1. Download the saver .ckpt file, and put it into saver_best/
  2. Install all required python3 packages through pip3
  3. Set up the data path in demo.sh
  4. run demo.sh

To-do

  • "beam search" implementation
  • comparison of the Luong and Bahdanau Attention

Schedule Sampling

I used inverse-sigmoid for my schedule sampling.

probs = 
[0.88079708 0.87653295 0.87213843 0.86761113 0.86294871 0.85814894
 0.85320966 0.84812884 0.84290453 0.83753494 0.83201839 0.82635335
 0.82053848 0.81457258 0.80845465 0.80218389 0.7957597  0.78918171
 0.78244978 0.77556401 0.76852478 0.76133271 0.75398872 0.74649398
 0.73885001 0.73105858 0.72312181 0.71504211 0.70682222 0.69846522
 0.68997448 0.68135373 0.67260702 0.6637387  0.65475346 0.64565631
 0.63645254 0.62714777 0.61774787 0.60825903 0.59868766 0.58904043
 0.57932425 0.56954622 0.55971365 0.549834   0.53991488 0.52996405
 0.51998934 0.50999867 0.5        0.49000133 0.48001066 0.47003595
 0.46008512 0.450166   0.44028635 0.43045378 0.42067575 0.41095957
 0.40131234 0.39174097 0.38225213 0.37285223 0.36354746 0.35434369
 0.34524654 0.3362613  0.32739298 0.31864627 0.31002552 0.30153478
 0.29317778 0.28495789 0.27687819 0.26894142 0.26114999 0.25350602
 0.24601128 0.23866729 0.23147522 0.22443599 0.21755022 0.21081829
 0.2042403  0.19781611 0.19154535 0.18542742 0.17946152 0.17364665
 0.16798161 0.16246506 0.15709547 0.15187116 0.14679034 0.14185106
 0.13705129 0.13238887 0.12786157 0.12346705]

Correct descriptions

TZ860P4iTaM_15_28.avi,a cat is playing the piano qvg9eM4Hmzk_4_10.avi,a man is lifting a truck
raw
UXs3eq68ZjE_250_255.avi,someone is is adding rice a pot 0lh_UWF9ZP4_62_69.avi,a woman is mixing ingredients
raw raw

Relevant but incorrect descriptions

778mkceE0UQ_40_46.avi,a car is driving a a car PeUHy0A1GF0_114_121.avi,a woman is the shrimp
raw raw
ufFT2BWh3BQ_0_8.avi,a panda panda is WTf5EgVY5uU_124_128.avi,a woman is oil onions and
raw raw

saver (Currently Unavailable)

The model save_net.ckpt-9407.data-00000-of-00001 is quite large (186MB), you are suggested to download the .ckpt separately. You can download this model from here.

However, you can just directly reproduce this result by running ./run.sh

Dataset Tree (Currently Unavailable)

.
├── bleu_eval.py
├── sample_output_testset.txt
├── testing_data/
│   ├── feat/ # 100 files, .npy
│   ├── video/ #.avi
│	└── id.txt
├── testing_label.json
├── training_data/
│   ├── feat/ # 1450 files, .npy
│   ├── video/ # .avi 
│	└── id.txt
└── training_label.json

6 directories, 6 files

Other branches: (Currently Unavailable)

1. Bidirectional RNN

https://github.com/AdrianHsu/MLDS2018SPRING/tree/241b127329e4dae85caaa0d294d81a1a1795cb5f

2. raw_rnn() combined with two dynamic_rnn()

https://github.com/AdrianHsu/MLDS2018SPRING/tree/66bde2627a0f36360dcffa5d76583ce49514ae8a

References

[1] S. Venugopalan, M. Rohrbach, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence video to text. In Proc. ICCV, 2015

http://www.cs.utexas.edu/users/ml/papers/venugopalan.iccv15.pdf

[2] Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS, 2015.

https://arxiv.org/abs/1506.03099

[3] Thang Luong, Hieu Pham, and Christopher D. Manning. Effective Approaches to Attention based Neural Machine Translation. In EMNLP, 2015.

https://arxiv.org/abs/1508.04025

[4] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural Machine Translation by Jointly Learning To Align and Translate. In ICLR, 2015.

https://arxiv.org/abs/1409.0473

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].