All Projects → abhshkdz → Neural Vqa

abhshkdz / Neural Vqa

❔ Visual Question Answering in Torch

Programming Languages

lua
6591 projects

Projects that are alternatives of or similar to Neural Vqa

Visdial
[CVPR 2017] Torch code for Visual Dialog
Stars: ✭ 215 (-55.85%)
Mutual labels:  natural-language-processing, torch
Pytorch Beam Search Decoding
PyTorch implementation of beam search decoding for seq2seq models
Stars: ✭ 204 (-58.11%)
Mutual labels:  natural-language-processing, torch
Practical Pytorch
Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained
Stars: ✭ 4,329 (+788.91%)
Mutual labels:  natural-language-processing
Textgan Pytorch
TextGAN is a PyTorch framework for Generative Adversarial Networks (GANs) based text generation models.
Stars: ✭ 479 (-1.64%)
Mutual labels:  natural-language-processing
Ml Visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
Stars: ✭ 5,676 (+1065.5%)
Mutual labels:  natural-language-processing
Waifu2x
Image Super-Resolution for Anime-Style Art
Stars: ✭ 22,741 (+4569.61%)
Mutual labels:  torch
Book Socialmediaminingpython
Companion code for the book "Mastering Social Media Mining with Python"
Stars: ✭ 462 (-5.13%)
Mutual labels:  natural-language-processing
Cs224n 2019 Solutions
Complete solutions for Stanford CS224n, winter, 2019
Stars: ✭ 436 (-10.47%)
Mutual labels:  natural-language-processing
Ml Mipt
Open Machine Learning course at MIPT
Stars: ✭ 480 (-1.44%)
Mutual labels:  natural-language-processing
Nlp.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
Stars: ✭ 4,670 (+858.93%)
Mutual labels:  natural-language-processing
Tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Stars: ✭ 5,077 (+942.51%)
Mutual labels:  natural-language-processing
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (-6.78%)
Mutual labels:  natural-language-processing
Jionlp
中文 NLP 任务预处理工具包,准确、高效、零使用门槛
Stars: ✭ 449 (-7.8%)
Mutual labels:  natural-language-processing
Word forms
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.
Stars: ✭ 463 (-4.93%)
Mutual labels:  natural-language-processing
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+4412.94%)
Mutual labels:  natural-language-processing
Stealth
An open source Ruby framework for text and voice chatbots. 🤖
Stars: ✭ 481 (-1.23%)
Mutual labels:  natural-language-processing
Open Korean Text
Open Korean Text Processor - An Open-source Korean Text Processor
Stars: ✭ 438 (-10.06%)
Mutual labels:  natural-language-processing
Kaggle Homedepot
3rd Place Solution for HomeDepot Product Search Results Relevance Competition on Kaggle.
Stars: ✭ 452 (-7.19%)
Mutual labels:  natural-language-processing
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (-5.54%)
Mutual labels:  natural-language-processing
Rnnlg
RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
Stars: ✭ 487 (+0%)
Mutual labels:  natural-language-processing

neural-vqa

Join the chat at https://gitter.im/abhshkdz/neural-vqa

This is an experimental Torch implementation of the VIS + LSTM visual question answering model from the paper Exploring Models and Data for Image Question Answering by Mengye Ren, Ryan Kiros & Richard Zemel.

Model architecture

Setup

Requirements:

Download the MSCOCO train+val images and VQA data using sh data/download_data.sh. Extract all the downloaded zip files inside the data folder.

unzip Annotations_Train_mscoco.zip
unzip Questions_Train_mscoco.zip
unzip train2014.zip

unzip Annotations_Val_mscoco.zip
unzip Questions_Val_mscoco.zip
unzip val2014.zip

If you had them downloaded already, copy over the train2014 and val2014 image folders and VQA JSON files to the data folder.

Download the VGG-19 Caffe model and prototxt using sh models/download_models.sh.

Known issues

  • To avoid memory issues with LuaJIT, install Torch with Lua 5.1 (TORCH_LUA_VERSION=LUA51 ./install.sh). More instructions here.
  • If working with plain Lua, luaffifb may be needed for loadcaffe, unless using pre-extracted fc7 features.

Usage

Extract image features

th extract_fc7.lua -split train
th extract_fc7.lua -split val

Options

  • batch_size: Batch size. Default is 10.
  • split: train/val. Default is train.
  • gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.
  • proto_file: Path to the deploy.prototxt file for the VGG Caffe model. Default is models/VGG_ILSVRC_19_layers_deploy.prototxt.
  • model_file: Path to the .caffemodel file for the VGG Caffe model. Default is models/VGG_ILSVRC_19_layers.caffemodel.
  • data_dir: Data directory. Default is data.
  • feat_layer: Layer to extract features from. Default is fc7.
  • input_image_dir: Image directory. Default is data.

Training

th train.lua

Options

  • rnn_size: Size of LSTM internal state. Default is 512.
  • num_layers: Number of layers in LSTM
  • embedding_size: Size of word embeddings. Default is 512.
  • learning_rate: Learning rate. Default is 4e-4.
  • learning_rate_decay: Learning rate decay factor. Default is 0.95.
  • learning_rate_decay_after: In number of epochs, when to start decaying the learning rate. Default is 15.
  • alpha: Alpha for adam. Default is 0.8
  • beta: Beta used for adam. Default is 0.999.
  • epsilon: Denominator term for smoothing. Default is 1e-8.
  • batch_size: Batch size. Default is 64.
  • max_epochs: Number of full passes through the training data. Default is 15.
  • dropout: Dropout for regularization. Probability of dropping input. Default is 0.5.
  • init_from: Initialize network parameters from checkpoint at this path.
  • save_every: No. of iterations after which to checkpoint. Default is 1000.
  • train_fc7_file: Path to fc7 features of training set. Default is data/train_fc7.t7.
  • fc7_image_id_file: Path to fc7 image ids of training set. Default is data/train_fc7_image_id.t7.
  • val_fc7_file: Path to fc7 features of validation set. Default is data/val_fc7.t7.
  • val_fc7_image_id_file: Path to fc7 image ids of validation set. Default is data/val_fc7_image_id.t7.
  • data_dir: Data directory. Default is data.
  • checkpoint_dir: Checkpoint directory. Default is checkpoints.
  • savefile: Filename to save checkpoint to. Default is vqa.
  • gpuid: 0-indexed id of GPU to use. Default is -1 = CPU.

Testing

th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610.t7 -input_image_path data/train2014/COCO_train2014_000000405541.jpg -question 'What is the cat on?'

Options

  • checkpoint_file: Path to model checkpoint to initialize network parameters from
  • input_image_path: Path to input image
  • question: Question string

Sample predictions

Randomly sampled image-question pairs from the VQA test set, and answers predicted by the VIS+LSTM model.

Q: What animals are those? A: Sheep

Q: What color is the frisbee that's upside down? A: Red

Q: What is flying in the sky? A: Kite

Q: What color is court? A: Blue

Q: What is in the standing person's hands? A: Bat

Q: Are they riding horses both the same color? A: No

Q: What shape is the plate? A: Round

Q: Is the man wearing socks? A: Yes

Q: What is over the woman's left shoulder? A: Fork

Q: Where are the pink flowers? A: On wall

Implementation Details

  • Last hidden layer image features from VGG-19
  • Zero-padded question sequences for batched implementation
  • Training questions are filtered for top_n answers, top_n = 1000 by default (~87% coverage)

Pretrained model and data files

To reproduce results shown on this page or try your own image-question pairs, download the following and run predict.lua with the appropriate paths.

References

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].