Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Vector Hub - Library for easy discovery, and consumption of State-of-the-art models to turn data into vectors. (text2vec, image2vec, video2vec, graph2vec, bert, inception, etc)

Stars: ✭ 317 (+196.26%)

Mutual labels: video-processing, audio-processing

Pytorch Learners Tutorial

PyTorch tutorial for learners

Stars: ✭ 97 (-9.35%)

Mutual labels: deep-neural-networks, lstm-neural-networks

eloquent-ffmpeg

High-level API for FFmpeg's Command Line Tools

Stars: ✭ 71 (-33.64%)

Mutual labels: video-processing, audio-processing

Predictive Maintenance Using Lstm

Example of Multiple Multivariate Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras.

Stars: ✭ 352 (+228.97%)

Mutual labels: deep-neural-networks, lstm-neural-networks

DuME

A fast, versatile, easy-to-use and cross-platform Media Encoder based on FFmpeg

Stars: ✭ 66 (-38.32%)

Mutual labels: video-processing, audio-processing

Mlt

MLT Multimedia Framework

Stars: ✭ 836 (+681.31%)

Mutual labels: video-processing, audio-processing

lecture-demos

Demonstrations for the interactive exploration of selected core concepts of audio, image and video processing as well as related topics

Stars: ✭ 12 (-88.79%)

Mutual labels: video-processing, audio-processing

tennis action recognition

Using deep learning to perform action recognition in the sport of tennis.

Stars: ✭ 17 (-84.11%)

Mutual labels: video-processing, cnn-keras

Skater

Python Library for Model Interpretation/Explanations

Stars: ✭ 973 (+809.35%)

Mutual labels: deep-neural-networks, lstm-neural-networks

Automatic Image Captioning

Generating Captions for images using Deep Learning

Stars: ✭ 84 (-21.5%)

Mutual labels: lstm-neural-networks, image-captioning

View All Similar Projects ➔

Video Captioning

Generate caption for the given video clip

Branch : VideoCaption (1a2124d), VideoCaption_catt (647e73b4)

Model

Model generates natural sentence word by word

Audio SubModel	Video SubModel	Sentence Generation SubModel

Context extraction for Temporal Attention Model, at i^th word generation

Results - f5c22f7

Test videos with good results



two men are talking about a cooking show	a woman is cooking	a dog is running around a field

a woman is talking about a makeup face	a man is driving a car down the road	a man is cooking in a kitchen

a man is playing a video game	two men are playing table tennis in a stadium	a man is talking about a computer program

Test videos with poor results



a person is playing with a toy	a man is walking on the field	a man is standing in a gym

Try it out!!!

Please feel free to raise PR with necessary suggestions.
Clone the repository`
- git clone https://github.com/scopeInfinity/Video2Description.git

Install docker and docker-compose

Current config has docker-compose file format '3.2'.
- https://github.com/docker/compose/releases

sudo apt-get install docker.io`
sudo curl -L "https://github.com/docker/compose/releases/download/1.25.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

docs
- https://docs.docker.com/install/linux/docker-ce/ubuntu/
- https://docs.docker.com/compose/install/

Pull the prebuild images and run the container

$ docker-compose pull
$ docker-compose up

Browse to http://localhost:8080/
- backend might take few minutes to reach a stable stage.

Execution without Docker

We can go always go through backend.Dockerfile and frontend.Dockerfile to understand better.
Update src/config.json as per the requirement and use those path during upcoming steps.
- To know more about any field, just search for the reference in the codebase.
Install miniconda
- https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
Get glove.6B.300d.txt from https://nlp.stanford.edu/projects/glove/
Install ffmpeg
- Configure, build and install ffmpeg from source with shared libraries

$ git clone 'https://github.com/FFmpeg/FFmpeg.git'
$ cd FFmpeg
$ ./configure --enable-shared  # Use --prefix if need to install in custom directory
$ make
# make install

If required, use https://github.com/tylin/coco-caption/ for scoring the model.
Then create conda environment using environment.yml
- $ conda env create -f environment.yml
And activate the environment

$ conda activate .

Turn up the backend
- src$ python -m backend.parser server --start --model /path/to/model
Turn up the web frontend
- src$ python -m frontend.app

Info

Data Directory and Working Directory can be same as the project root directory.

Data Directory

File	Reference
/path/to/data_dir/VideoDataset/videodatainfo_2017.json	http://ms-multimedia-challenge.com/2017/dataset
/path/to/data_dir/VideoDataset/videos/[0-9]+.mp4	Download videos based on above dataset
/path/to/data_dir/glove/glove.6B.300d.txt	https://nlp.stanford.edu/projects/glove/
/path/to/data_dir/VideoDataset/cache_40_224x224/[0-9]+.npy	Video cached files will be created on fly

Working Directory

File	Content
/path/to/working_dir/glove.dat	Pickle Dumped Glove Embedding
/path/to/working_dir/vocab.dat	Pickle Dumped Vocabulary Words

Download Dataset

Execute python videohandler.py from VideoDataset Directory

Execution

It currently supports train, predict and server mode. Please use the following command for better explanation.

src$ python -m backend.parse -h

Training Methods

Try Iterative Learning
Try Random Learning

Evaluation

Prerequisite

cd /path/to/eval_dir/
git clone 'https://github.com/tylin/coco-caption.git' cococaption
ln /path/to/working_dir/cocoeval.py cococaption/

Evaluate

# One can do changes in parser.py for numbers of test examples to be considered in evaluation
python parser.py predict save_all_test
python /path/to/eval_dir/cocoeval.py <results file>.txt

Sample Evaluation while training

Commit	Training	Total	CIDEr	Bleu_4	ROUGE_L	METEOR	Model Filename
647e73b4	10 epochs	1.1642	0.1580	0.3090	0.4917	0.2055	CAttention_ResNet_D512L512_G128G64_D1024D0.20BN_BDGRU1024_D0.2L1024DVS_model.dat_4990_loss_2.484_Cider0.360_Blue0.369_Rouge0.580_Meteor0.256
1a2124d	17 epochs	1.1599	0.1654	0.3022	0.4849	0.2074	ResNet_D512L512_G128G64_D1024D0.20BN_BDLSTM1024_D0.2L1024DVS_model.dat_4987_loss_2.203_Cider0.342_Blue0.353_Rouge0.572_Meteor0.256
f5c22f7	17 epochs	1.1559	0.1680	0.3000	0.4832	0.2047	ResNet_D512L512_G128G64_D1024D0.20BN_BDGRU1024_D0.2L1024DVS_model.dat_4983_loss_2.350_Cider0.355_Blue0.353_Rouge0.571_Meteor0.247_TOTAL_1.558_BEST
bd072ac	11 CPUhrs with Multiprocessing (16 epochs)	1.0736	0.1528	0.2597	0.4674	0.1936	ResNet_D512L512_D1024D0.20BN_BDGRU1024_D0.2L1024DVS_model.dat_4986_loss_2.306_Cider0.347_Blue0.328_Rouge0.560_Meteor0.246
3ccf5d5	15 CPUhrs	1.0307	0.1258	0.2535	0.4619	0.1895	res_mcnn_rand_b100_s500_model.dat_model1_3ccf5d5

Check Specifications section for model comparision.

Temporal attention Model for is on VideoCaption_catt branch.

Pre-trained Models : https://drive.google.com/open?id=1gexBRQfrjfcs7N5UI5NtlLiIR_xa69tK

Web Server

Start the server (S) for to compute predictions (Within conda environment)

python parser.py server -s -m <path/to/correct/model>

Check config.json for configurations.
Execute python app.py from webserver (No need for conda environment)
- Make sure, your the process is can new files inside $UPLOAD_FOLDER
Open http://webserver:5000/ to open Web Server for testing (under default configuration)

Specifications

Commit: 3ccf5d5

ResNet over LSTM for feature extraction
Word by Word generation based on last prediction for Sentence Generation using LSTM
Random Dataset Learning of training data
Vocab Size 9448
Glove of 300 Dimension

Commit: bd072ac

ResNet over BiDirection GRU for feature extraction
Sequential Learning of training data
Batch Normalization + Few more tweaks in Model
Bleu, CIDEr, Rouge, Meteor score generation for validation
Multiprocessing keras

Commit: f5c22f7

Audio with BiDirection GRU

Commit: 1a2124d

Audio with BiDirection LSTM

Commit: 647e73b

Audio with BiDirection GRU using temporal attention for context

Image Captioning

Generate caption for the given images

Branch : onehot_gen

Commit : 898f15778d40b67f333df0a0e744a4af0b04b16c

Trained Model : https://drive.google.com/open?id=1qzMCAbh_tW3SjMMVSPS4Ikt6hDnGfhEN

Categorical Crossentropy Loss : 0.58

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 107

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗