All Projects → saahiluppal → catr

saahiluppal / catr

Licence: Apache-2.0 license
Image Captioning Using Transformer

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to catr

RSTNet
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
Stars: ✭ 71 (-65.53%)
Mutual labels:  transformer, image-captioning
Sightseq
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
Stars: ✭ 116 (-43.69%)
Mutual labels:  transformer, image-captioning
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-82.52%)
Mutual labels:  transformer, image-captioning
Fairseq Image Captioning
Transformer-based image captioning extension for pytorch/fairseq
Stars: ✭ 180 (-12.62%)
Mutual labels:  transformer, image-captioning
Omninet
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
Stars: ✭ 448 (+117.48%)
Mutual labels:  transformer, image-captioning
Meshed Memory Transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Stars: ✭ 230 (+11.65%)
Mutual labels:  transformer, image-captioning
cape
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (-85.92%)
Mutual labels:  transformer
Transformer Temporal Tagger
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
Stars: ✭ 55 (-73.3%)
Mutual labels:  transformer
Cross-lingual-Summarization
Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention
Stars: ✭ 28 (-86.41%)
Mutual labels:  transformer
php-serializer
Serialize PHP variables, including objects, in any format. Support to unserialize it too.
Stars: ✭ 47 (-77.18%)
Mutual labels:  transformer
CS231n
CS231n Assignments Solutions - Spring 2020
Stars: ✭ 48 (-76.7%)
Mutual labels:  image-captioning
dingo-serializer-switch
A middleware to switch fractal serializers in dingo
Stars: ✭ 49 (-76.21%)
Mutual labels:  transformer
les-military-mrc-rank7
莱斯杯:全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案
Stars: ✭ 37 (-82.04%)
Mutual labels:  transformer
kaggle-champs
Code for the CHAMPS Predicting Molecular Properties Kaggle competition
Stars: ✭ 49 (-76.21%)
Mutual labels:  transformer
transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
Stars: ✭ 201 (-2.43%)
Mutual labels:  transformer
libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Stars: ✭ 284 (+37.86%)
Mutual labels:  transformer
text simplification
Text Simplification Model based on Encoder-Decoder (includes Transformer and Seq2Seq) model.
Stars: ✭ 66 (-67.96%)
Mutual labels:  transformer
project-code-py
Leetcode using AI
Stars: ✭ 100 (-51.46%)
Mutual labels:  transformer
TokenLabeling
Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
Stars: ✭ 385 (+86.89%)
Mutual labels:  transformer
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-91.26%)
Mutual labels:  transformer

CA⫶TR: Image Captioning with Transformers

PyTorch training code and pretrained models for CATR (CAption TRansformer).

The models are also available via torch hub, to load model with pretrained weights simply do:

model = torch.hub.load('saahiluppal/catr', 'v3', pretrained=True)  # you can choose between v1, v2 and v3

Samples:

All these images has been annotated by CATR.

Test with your own bunch of images:

$ python predict.py --path /path/to/image --v v2  // You can choose between v1, v2, v3 [default is v3]

Or Try it out in colab notebook

Usage

There are no extra compiled components in CATR and package dependencies are minimal, so the code is very simple to use. We provide instructions how to install dependencies. First, clone the repository locally:

$ git clone https://github.com/saahiluppal/catr.git

Then, install PyTorch 1.5+ and torchvision 0.6+ along with remaining dependencies:

$ pip install -r requirements.txt

That's it, should be good to train and test caption models.

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

Tweak the hyperparameters from configuration file.

To train baseline CATR on a single GPU for 30 epochs run:

$ python main.py

We train CATR with AdamW setting learning rate in the transformer to 1e-4 and 1e-5 in the backbone. Horizontal flips, scales an crops are used for augmentation. Images are rescaled to have max size 299. The transformer is trained with dropout of 0.1, and the whole model is trained with grad clip of 0.1.

Testing

To test CATR with your own images.

$ python predict.py --path /path/to/image --v v2  // You can choose between v1, v2, v3 [default is v3]

License

CATR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].