All Categories → No Category → video-language

Top 3 video-language open source projects

Multi-Modal-Transformer
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets. Additionally, it also collects many useful tutorials and tools in these related domains.
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
ReferFormer
[CVPR2022] Official Implementation of ReferFormer
1-3 of 3 video-language projects