All Projects → rohit-gupta → Video2Language

rohit-gupta / Video2Language

Licence: other
Generating video descriptions using deep learning in Keras

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Video2Language

visual syntactic embedding video captioning
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
Stars: ✭ 23 (+4.55%)
Mutual labels:  video-captioning, video-to-text
densecap
Dense video captioning in PyTorch
Stars: ✭ 37 (+68.18%)
Mutual labels:  video-captioning
Steppy Toolkit
Curated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-4.55%)
Mutual labels:  keras-models
Repo 2017
Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano
Stars: ✭ 1,123 (+5004.55%)
Mutual labels:  keras-models
Deeplearning
This repository will contain the example detailed codes of Tensorflow and Keras, This repository will be useful for Deep Learning staters who find difficult to understand the example codes
Stars: ✭ 49 (+122.73%)
Mutual labels:  keras-models
Ssd keras
A Keras port of Single Shot MultiBox Detector
Stars: ✭ 1,763 (+7913.64%)
Mutual labels:  keras-models
Deepstack
DeepStack: Ensembling Keras Deep Learning Models into the next Performance Level
Stars: ✭ 25 (+13.64%)
Mutual labels:  keras-models
GTAV-Self-driving-car
Self driving car in GTAV with Deep Learning
Stars: ✭ 15 (-31.82%)
Mutual labels:  keras-models
Video Classification Cnn And Lstm
To classify video into various classes using keras library with tensorflow as back-end.
Stars: ✭ 218 (+890.91%)
Mutual labels:  keras-models
Ai Platform
An open-source platform for automating tasks using machine learning models
Stars: ✭ 61 (+177.27%)
Mutual labels:  keras-models
Audio Pretrained Model
A collection of Audio and Speech pre-trained models.
Stars: ✭ 61 (+177.27%)
Mutual labels:  keras-models
Sketchback
Keras implementation of sketch inversion using deep convolution neural networks (synthesising photo-realistic images from pencil sketches)
Stars: ✭ 50 (+127.27%)
Mutual labels:  keras-models
Dkeras
Distributed Keras Engine, Make Keras faster with only one line of code.
Stars: ✭ 181 (+722.73%)
Mutual labels:  keras-models
Keras Pytorch Avp Transfer Learning
We pit Keras and PyTorch against each other, showing their strengths and weaknesses in action. We present a real problem, a matter of life-and-death: distinguishing Aliens from Predators!
Stars: ✭ 42 (+90.91%)
Mutual labels:  keras-models
Facial emotion recognition using Keras
I have used FER2013 dataset and try to build the Facial emotion recognition using Keras
Stars: ✭ 16 (-27.27%)
Mutual labels:  keras-models
Keras Frcnn
Stars: ✭ 940 (+4172.73%)
Mutual labels:  keras-models
Bidaf Keras
Bidirectional Attention Flow for Machine Comprehension implemented in Keras 2
Stars: ✭ 60 (+172.73%)
Mutual labels:  keras-models
Keras transfer cifar10
Object classification with CIFAR-10 using transfer learning
Stars: ✭ 120 (+445.45%)
Mutual labels:  keras-models
keras-aquarium
a small collection of models implemented in keras, including matrix factorization(recommendation system), topic modeling, text classification, etc. Runs on tensorflow.
Stars: ✭ 14 (-36.36%)
Mutual labels:  keras-models
emusic net
Neural network to classify certain styles of Electronic music
Stars: ✭ 22 (+0%)
Mutual labels:  keras-models

V2L-MSVD

Generating video descriptions using deep learning in Keras

Start with AWS Ubuntu Deep Learning AMI on a EC2 p2.xlarge instance. (or better, p2.xlarge costs $0.9/hour on-demand and ~$0.3/hour as a spot instance)

source activate tensorflow_p27
conda install scikit-learn
conda install scikit-image

If you are not using AWS, ensure you have a recent version of Keras and Tensorflow installed and working, and also install scikit-learn and scikit-image if you want to train tag prediction models

git clone https://github.com/rohit-gupta/V2L-MSVD.git
cd V2L-MSVD

Using a pre-trained video captioning model

Use a video from YouTube

bash fetch-pretrained-model.sh
sudo bash install-youtube-dl.sh
bash fetch-youtube-video.sh https://www.youtube.com/watch?v=cKWuNQAy2Sk
bash process-youtube-video.sh 

Use a video from your local disk

bash fetch-pretrained-model.sh
bash fetch-from-localpath.sh /home/ubuntu/vid1.mp4
bash process-youtube-video.sh 

Training your own video captioning model

Download data: should take about 2 minutes

bash fetch-data.sh

Preprocess text data: ETA ~5 minutes

If you only want to use Verified descriptions ->

bash preprocess-data.sh CleanOnly 

If you want to use both verified and unverified descriptions ->

bash preprocess-data.sh

Extract frames from the Videos: ETA ~30 minutes

bash extract_frames.sh

Extract Video Features: ETA ~15 Minutes

bash run-feature-extractor.sh

Tag Model: ETA ~5 Minutes

bash train-simple-tag-prediction-model.sh

Train Language Model: ETA ~50 minutes (Can be killed around ~25 minutes after 5 Epochs)

bash train-language-model.sh

Score Language Model: ETA ~5 minutes

bash score-language-model.sh

Known Issues

  • If at any stage you get an error that contains
/lib/libstdc++.so.6: version `CXXABI_1.3.x' not found

You can fix it with:

cd ~/anaconda3/envs/tensorflow_p27/lib && mv libstdc++.a stdcpp_bkp && mv libstdc++.so stdcpp_bkp && mv libstdc++.so.6 stdcpp_bkp && mv libstdc++.so.6.0.19 stdcpp_bkp/  && mv libstdc++.so.6.0.19-gdb.py stdcpp_bkp/  && mv libstdc++.so.6.0.21 stdcpp_bkp/  && mv libstdc++.so.6.0.24 stdcpp_bkp/ && cd -
  • Tensorflow 1.3 has a memory leak bug that might affect this code

You can fix it by upgrading Tensorflow.

Reference for this problem: #3

Results

The video captioning model here uses Mean Pooled ResNet50 features of video frames along with Object, Action and Attribute tags predicted by a simple feedforward network.

The Table below compares the performance of our model with some other models that also rely on mean pooled frame features. It is sourced from papers 1, 2 and 3.

Model METEOR score on MSVD
Mean Pooled (AlexNet Features) 26.9
Mean Pooled (VGG Features) 27.7
Mean Pooled (GoogleNet Features) 28.7
Ours (Mean Pooled ResNet50 Features + Predicted Tags) 29.0

Language Model

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].