1. OpenvqaA lightweight, scalable, and general framework for visual question answering research
3. Mcan VqaDeep Modular Co-Attention Networks for Visual Question Answering
4. rositaROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration