Download the provided caffe folder and install caffe following the instructions in http://caffe.berkeleyvision.org/installation.html .
Download MSCOCO images, and VQA annotations and questions:

cd example/data/

./get_image.sh
Generate the hdf5 data for training and testing:

cd example/

python ./data/generate_h5_data/generate_h5_data.py
Train the model:

cd example/

run ./train/train_mm.sh
Model trained on VQA dataset: SMem-VQA
Predict the answers for the images and questions in VQA test-dev dataset:

cd example/

python ./prediction/predict_json.py

Citation

@inproceedings{xu2016ask,
    title = {Ask, attend and answer: Exploring question-guided spatial attention for visual question answering},
    author = {Xu, Huijuan and Saenko, Kate},
    booktitle = {European Conference on Computer Vision},
    year = {2016}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

VisionLearningGroup / Ask_Attend_and_Answer

Programming Languages

Ask_Attend_and_Answer

Code

Citation