1. E2e Glstm ScCode for paper "Image Caption Generation with Text-Conditional Semantic Attention"
2. VlpVision-Language Pre-training for Image Captioning and Question Answering
3. detectron-vlpDetectron for image/video region feature extraction, inspired by Xinlei's repo