All Projects → WingsBrokenAngel → delving-deeper-into-the-decoder-for-video-captioning

WingsBrokenAngel / delving-deeper-into-the-decoder-for-video-captioning

Licence: MIT license
Source code for Delving Deeper into the Decoder for Video Captioning

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to delving-deeper-into-the-decoder-for-video-captioning

visual syntactic embedding video captioning
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
Stars: ✭ 23 (-36.11%)
Mutual labels:  video-captioning, msvd, msr-vtt
fix-decoder
Unravels FIX messages into human readable tables
Stars: ✭ 71 (+97.22%)
Mutual labels:  decoder
m3gm
Max-Margin Markov Graph Models for WordNet (EMNLP 2018)
Stars: ✭ 40 (+11.11%)
Mutual labels:  semantics
acronym-decoder
Acronym Decoder
Stars: ✭ 39 (+8.33%)
Mutual labels:  decoder
biomappings
🗺️ Community curated and predicted equivalences and related mappings between named biological entities that are not available from primary sources.
Stars: ✭ 24 (-33.33%)
Mutual labels:  semantics
tg-file-decoder
Decode Telegram bot API file IDs
Stars: ✭ 30 (-16.67%)
Mutual labels:  decoder
brute-md5
Advanced, Light Weight & Extremely Fast MD5 Cracker/Decoder/Decryptor written in Python 3
Stars: ✭ 16 (-55.56%)
Mutual labels:  decoder
Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
Stars: ✭ 56 (+55.56%)
Mutual labels:  video-captioning
ffmpeg-h264-dec
H.264 decoder extracted from FFmpeg.
Stars: ✭ 81 (+125%)
Mutual labels:  decoder
Open-Imaging
Tools and libraries that deal with the creation and processing of images.
Stars: ✭ 100 (+177.78%)
Mutual labels:  decoder
Image deionising auto encoder
Noise removal from images using Convolutional autoencoder
Stars: ✭ 34 (-5.56%)
Mutual labels:  decoder
berserker
Berserker - BERt chineSE woRd toKenizER
Stars: ✭ 17 (-52.78%)
Mutual labels:  state-of-the-art
best AI papers 2021
A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.
Stars: ✭ 2,740 (+7511.11%)
Mutual labels:  state-of-the-art
vericert
A formally verified high-level synthesis tool based on CompCert and written in Coq.
Stars: ✭ 63 (+75%)
Mutual labels:  semantics
hcl-lang
Schema and decoder to be used as building blocks for an HCL2-based language server.
Stars: ✭ 44 (+22.22%)
Mutual labels:  decoder
strollr2d icassp2017
Image Denoising Codes using STROLLR learning, the Matlab implementation of the paper in ICASSP2017
Stars: ✭ 22 (-38.89%)
Mutual labels:  state-of-the-art
otfed
An OpenType font format encoder & decoder written in OCaml
Stars: ✭ 15 (-58.33%)
Mutual labels:  decoder
tinyh264
A tiny WASM h.264 decoder, for node and browser
Stars: ✭ 139 (+286.11%)
Mutual labels:  decoder
audio
Audio support for Go language.
Stars: ✭ 62 (+72.22%)
Mutual labels:  decoder
rabid
🍪 A CLI tool and library allowing to simply decode all kind of BigIP cookies.
Stars: ✭ 36 (+0%)
Mutual labels:  decoder

Delving Deeper into the Decoder for Video Captioning

PRs Welcome DeepLearning Github Watchers GitHub stars GitHub forks License

Table of Contents

  1. Description
  2. Requirement
  3. Manual
  4. Results
    1. Comparison on Youtube2Text
    2. Comparison on MSR-VTT
  5. Data
  6. Citation

Description

This repository is the source code for the paper named Delving Deeper into the Decoder for Video Captioning.
The paper has been accepted by ECAI 2020. The encoder-decoder framework is the most popular paradigm for video captioning task. There still exist some non-negligible problems in the decoder of a video captioning model. We propose three methods to improve the performance of the model.

  1. A combination of variational dropout and layer normalization is embeded into semantic compositional gated recurrent unit to alleviate the problem of overfitting.
  2. A unified, flexible method is proposed to evaluate the model performance on a validation set so as to select the best checkpoint for testing.
  3. A new training strategy called professional learning is proposed which develops the strong points of a captioning model and bypasses its weaknesses.

It is demonstrated in the experiments of MSVD and MSR-VTT datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 11.7% on MSVD and 5% on MSR-VTT compared with the previous state-of-the-art models.


If you need more information about how to generate training, validating and testing data for the datasets, please refer to Semantics-AssistedVideoCaptioning.


Professional Learning

Requirement

  1. Python 3.6
  2. TensorFlow-GPU 1.13
  3. pycocoevalcap (Python3)
  4. NumPy

Manual

  1. Make sure you have installed all the required packages.
  2. Download files in the Data section.
  3. cd path_to_directory_of_model; mkdir saves
  4. run_model.sh is used for training or testing models. Specify the GPU you want to use by modifying CUDA_VISIBLE_DEVICES value. name will be used in the name of saved model during training. Specify the needed data paths by modifying corpus, ecores, tag and ref values. test refers to the path of the saved model which is to be tested. Do not give a parameter to test if you want to train a model.
  5. After completing the configuration of the bash file, then bash run_model.sh for training or testing.

Results

Comparison on Youtube2Text

MSVD Results

Comparison on MSR-VTT

MSR-VTT Results


Data

MSVD

  • MSVD dataset and features: GoogleDrive
    • SHA-256 ca86eb2b90e302a4b7f3197065cad3b9be5285905952b95dbffb61cb0bf79e9c
  • Model Checkpoint: GoogleDrive
    • SHA-256 64089a49fe9de895c9805a85d50160404cb36ccb8c22a70a32fc7ef5a2abfff1

MSRVTT

  • MSRVTT dataset and features: GoogleDrive
    • SHA-256 611b297c4fbbdd58540373986453a991f285aed6cc18914ad930e1e7646f26fb
  • Model Checkpoint: GoogleDrive
    • SHA-256 fb04fd2d29900f7f8a712b6d2352e8227acd30173274b64a38fcea6a608e4a8e

Citation

@article{chen2020delving,
	title={Delving Deeper into the Decoder for Video Captioning},
	author={Haoran Chen and Jianmin Li and Xiaolin Hu},
	journal={CoRR},
    	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2001.05614},
	eprint={2001.05614},
	year={2020}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].