Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → aimagelab → Meshed Memory Transformer

aimagelab / Meshed Memory Transformer

Licence: bsd-3-clause

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch transformer image-captioning

Projects that are alternatives of or similar to Meshed Memory Transformer

catr

Image Captioning Using Transformer

Stars: ✭ 206 (-10.43%)

Mutual labels: transformer, image-captioning

Omninet

Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

Stars: ✭ 448 (+94.78%)

Mutual labels: image-captioning, transformer

RSTNet

RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)

Stars: ✭ 71 (-69.13%)

Mutual labels: transformer, image-captioning

Sightseq

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

Stars: ✭ 116 (-49.57%)

Mutual labels: image-captioning, transformer

Image-Caption

Using LSTM or Transformer to solve Image Captioning in Pytorch

Stars: ✭ 36 (-84.35%)

Mutual labels: transformer, image-captioning

Fairseq Image Captioning

Transformer-based image captioning extension for pytorch/fairseq

Stars: ✭ 180 (-21.74%)

Mutual labels: image-captioning, transformer

Graphtransformer

Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.

Stars: ✭ 187 (-18.7%)

Mutual labels: transformer

Linear Attention Transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length

Stars: ✭ 205 (-10.87%)

Mutual labels: transformer

Graph Transformer

Transformer for Graph Classification (Pytorch and Tensorflow)

Stars: ✭ 191 (-16.96%)

Mutual labels: transformer

Sentimentanalysis

Sentiment analysis neural network trained by fine-tuning BERT, ALBERT, or DistilBERT on the Stanford Sentiment Treebank.

Stars: ✭ 186 (-19.13%)

Mutual labels: transformer

Multigraph transformer

transformer, multi-graph transformer, graph, graph classification, sketch recognition, sketch classification, free-hand sketch, official code of the paper "Multi-Graph Transformer for Free-Hand Sketch Recognition"

Stars: ✭ 231 (+0.43%)

Mutual labels: transformer

Yin

The efficient and elegant JSON:API 1.1 server library for PHP

Stars: ✭ 214 (-6.96%)

Mutual labels: transformer

Image To Image Search

A reverse image search engine powered by elastic search and tensorflow

Stars: ✭ 200 (-13.04%)

Mutual labels: image-captioning

Gpt Scrolls

A collaborative collection of open-source safe GPT-3 prompts that work well

Stars: ✭ 195 (-15.22%)

Mutual labels: transformer

Bert Chainer

Chainer implementation of "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

Stars: ✭ 205 (-10.87%)

Mutual labels: transformer

Kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.

Stars: ✭ 190 (-17.39%)

Mutual labels: transformer

Paddlenlp

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

Stars: ✭ 212 (-7.83%)

Mutual labels: transformer

Dataturks

ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.

Stars: ✭ 200 (-13.04%)

Mutual labels: image-captioning

Pytorch Transformer

pytorch implementation of Attention is all you need

Stars: ✭ 199 (-13.48%)

Mutual labels: transformer

Sttn

[ECCV'2020] STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting

Stars: ✭ 211 (-8.26%)

Mutual labels: transformer

View All Similar Projects ➔

M²: Meshed-Memory Transformer

This repository contains the reference code for the paper Meshed-Memory Transformer for Image Captioning (CVPR 2020).

Please cite with the following BibTeX:

@inproceedings{cornia2020m2,
  title={{Meshed-Memory Transformer for Image Captioning}},
  author={Cornia, Marcella and Stefanini, Matteo and Baraldi, Lorenzo and Cucchiara, Rita},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Environment setup

Clone the repository and create the m2release conda environment using the environment.yml file:

conda env create -f environment.yml
conda activate m2release

Then download spacy data by executing the following command:

python -m spacy download en

Note: Python 3.6 is required to run our code.

Data preparation

To run the code, annotations and detection features for the COCO dataset are needed. Please download the annotations file annotations.zip and extract it.

Detection features are computed with the code provided by [1]. To reproduce our result, please download the COCO features file coco_detections.hdf5 (~53.5 GB), in which detections of each image are stored under the <image_id>_features key. <image_id> is the id of each COCO image, without leading zeros (e.g. the <image_id> for COCO_val2014_000000037209.jpg is 37209), and each value should be a (N, 2048) tensor, where N is the number of detections.

Evaluation

To reproduce the results reported in our paper, download the pretrained model file meshed_memory_transformer.pth and place it in the code folder.

Run python test.py using the following arguments:

Argument	Possible values
`--batch_size`	Batch size (default: 10)
`--workers`	Number of workers (default: 0)
`--features_path`	Path to detection features file
`--annotation_folder`	Path to folder with COCO annotations

Expected output

Under output_logs/, you may also find the expected output of the evaluation code.

Training procedure

Run python train.py using the following arguments:

Argument	Possible values
`--exp_name`	Experiment name
`--batch_size`	Batch size (default: 10)
`--workers`	Number of workers (default: 0)
`--m`	Number of memory vectors (default: 40)
`--head`	Number of heads (default: 8)
`--warmup`	Warmup value for learning rate scheduling (default: 10000)
`--resume_last`	If used, the training will be resumed from the last checkpoint.
`--resume_best`	If used, the training will be resumed from the best checkpoint.
`--features_path`	Path to detection features file
`--annotation_folder`	Path to folder with COCO annotations
`--logs_folder`	Path folder for tensorboard logs (default: "tensorboard_logs")

For example, to train our model with the parameters used in our experiments, use

python train.py --exp_name m2_transformer --batch_size 50 --m 40 --head 8 --warmup 10000 --features_path /path/to/features --annotation_folder /path/to/annotations

References

[1] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 230

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (20) 🔗