All Projects → bearcatt → LaBERT

bearcatt / LaBERT

Licence: other
A length-controllable and non-autoregressive image captioning model.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to LaBERT

Image-Captioning-with-Beam-Search
Generating image captions using Xception Network and Beam Search in Keras
Stars: ✭ 18 (-64%)
Mutual labels:  image-captioning
Udacity
This repo includes all the projects I have finished in the Udacity Nanodegree programs
Stars: ✭ 57 (+14%)
Mutual labels:  image-captioning
tfvaegan
[ECCV 2020] Official Pytorch implementation for "Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification". SOTA results for ZSL and GZSL
Stars: ✭ 107 (+114%)
Mutual labels:  eccv2020
deep-atrous-guided-filter
Deep Atrous Guided Filter for Image Restoration in Under Display Cameras (UDC Challenge, ECCV 2020).
Stars: ✭ 32 (-36%)
Mutual labels:  eccv2020
PiP-Planning-informed-Prediction
(ECCV 2020) PiP: Planning-informed Trajectory Prediction for Autonomous Driving
Stars: ✭ 101 (+102%)
Mutual labels:  eccv2020
DeepPS
[ECCV 2020] "Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches"
Stars: ✭ 63 (+26%)
Mutual labels:  eccv2020
SRResCycGAN
Code repo for "Deep Cyclic Generative Adversarial Residual Convolutional Networks for Real Image Super-Resolution" (ECCVW AIM2020).
Stars: ✭ 47 (-6%)
Mutual labels:  eccv2020
People-Flows
The code for our ECCV 2020 paper: Estimating People Flows to Better Count Them in Crowded Scenes
Stars: ✭ 44 (-12%)
Mutual labels:  eccv2020
SAN
[ECCV 2020] Scale Adaptive Network: Learning to Learn Parameterized Classification Networks for Scalable Input Images
Stars: ✭ 41 (-18%)
Mutual labels:  eccv2020
BUTD model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
Stars: ✭ 28 (-44%)
Mutual labels:  image-captioning
udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (-28%)
Mutual labels:  image-captioning
IAST-ECCV2020
IAST: Instance Adaptive Self-training for Unsupervised Domain Adaptation (ECCV 2020) https://teacher.bupt.edu.cn/zhuchuang/en/index.htm
Stars: ✭ 84 (+68%)
Mutual labels:  eccv2020
WS3D
Official version of 'Weakly Supervised 3D object detection from Lidar Point Cloud'(ECCV2020)
Stars: ✭ 104 (+108%)
Mutual labels:  eccv2020
catr
Image Captioning Using Transformer
Stars: ✭ 206 (+312%)
Mutual labels:  image-captioning
FFWM
Implementation of "Learning Flow-based Feature Warping for Face Frontalization with Illumination Inconsistent Supervision" (ECCV 2020).
Stars: ✭ 107 (+114%)
Mutual labels:  eccv2020
CS231n
CS231n Assignments Solutions - Spring 2020
Stars: ✭ 48 (-4%)
Mutual labels:  image-captioning
Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Stars: ✭ 139 (+178%)
Mutual labels:  non-autoregressive
visdial
Visual Dialog: Light-weight Transformer for Many Inputs (ECCV 2020)
Stars: ✭ 27 (-46%)
Mutual labels:  eccv2020
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (+32%)
Mutual labels:  non-autoregressive
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (+114%)
Mutual labels:  non-autoregressive

Length-Controllable Image Captioning (ECCV2020)

This repo provides the implemetation of the paper Length-Controllable Image Captioning.

Install

conda create --name labert python=3.7
conda activate labert

conda install pytorch=1.3.1 torchvision cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1
pip install git+https://github.com/salaniz/pycocoevalcap

Data & Pre-trained Models

  • Prepare MSCOCO data follow link.
  • Download pretrained Bert and Faster-RCNN from Baidu Cloud Disk [code: 0j9f] or Google Drive.
    • It's an unified checkpoint file, containing a pretrained Bert-base and the fc6 layer of the Faster-RCNN.
  • Download our pretrained LaBERT model from Baidu Cloud Disk [code: fpke] or Google Drive.

Scripts

Train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Continue train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU \
  model_path $PATH_TO_MODEL

Inference

python inference.py \
  model_path $PATH_TO_MODEL \
  save_dir $PATH_TO_TEST_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Evaluate

python evaluate.py \
  --gt_caption data/id2captions_test.json \
  --pd_caption $PATH_TO_TEST_OUTPUT/caption_results.json \
  --save_dir $PATH_TO_TEST_OUTPUT

Cite

Please consider citing our paper in your publications if the project helps your research.

@article{deng2020length,
  title={Length-Controllable Image Captioning},
  author={Deng, Chaorui and Ding, Ning and Tan, Mingkui and Wu, Qi},
  journal={arXiv preprint arXiv:2007.09580},
  year={2020}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].