MmfA modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
clip-guided-diffusionA CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Modality-Transferable-MERModality-Transferable-MER, multimodal emotion recognition model with zero-shot and few-shot abilities.
slpUtils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
LAVT-pytorchLAVT: Language-Aware Vision Transformer for Referring Image Segmentation
MVGLTCyb 2018: Graph learning for multiview clustering
NER-Multimodal-pytorchPytorch Implementation of "Adaptive Co-attention Network for Named Entity Recognition in Tweets" (AAAI 2018)
docarrayThe data structure for unstructured data
RSTNetRSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
nemar[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
iMIXA framework for Multimodal Intelligence research from Inspur HSSLAB.
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
gakgGAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.
mix-stageOfficial Repository for the paper Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach published in ECCV 2020 (https://arxiv.org/abs/2007.12553)
Kaleido-BERT(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain.
pykaleKnowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem
Fengshenbang-LMFengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
VideoNavQAAn alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
tsflexFlexible time series feature extraction & processing