yashkant / sam-textvqa

Licence: other

Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.

Programming Languages

python

139335 projects - #7 most used programming language

50402 projects - #5 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to sam-textvqa

Openkai

OpenKAI: A modern framework for unmanned vehicle and robot control

Stars: ✭ 150 (+194.12%)

Mutual labels: vision

Arc Robot Vision

MIT-Princeton Vision Toolbox for Robotic Pick-and-Place at the Amazon Robotics Challenge 2017 - Robotic Grasping and One-shot Recognition of Novel Objects with Deep Learning.

Stars: ✭ 224 (+339.22%)

Mutual labels: vision

nested-transformer

Nested Hierarchical Transformer https://arxiv.org/pdf/2105.12723.pdf

Stars: ✭ 174 (+241.18%)

Mutual labels: vision

Apriltag ros

A ROS wrapper of the AprilTag 3 visual fiducial detector

Stars: ✭ 160 (+213.73%)

Mutual labels: vision

React Native Text Detector

Text Detector from image for react native using firebase MLKit on android and Tesseract on iOS

Stars: ✭ 194 (+280.39%)

Mutual labels: vision

Amazing Arkit

ARKit相关资源汇总群：326705018

Stars: ✭ 239 (+368.63%)

Mutual labels: vision

Robotcar Dataset Sdk

Software Development Kit for the Oxford Robotcar Dataset

Stars: ✭ 151 (+196.08%)

Mutual labels: vision

autonomous-delivery-robot

Repository for Autonomous Delivery Robot project of IvLabs, VNIT

Stars: ✭ 65 (+27.45%)

Mutual labels: vision

Simplecv

Stars: ✭ 2,522 (+4845.1%)

Mutual labels: vision

Learnable-Image-Resizing

TF 2 implementation Learning to Resize Images for Computer Vision Tasks (https://arxiv.org/abs/2103.09950v1).

Stars: ✭ 48 (-5.88%)

Mutual labels: vision

Attendance Using Face

Face-recognition using Siamese network

Stars: ✭ 174 (+241.18%)

Mutual labels: vision

Opticalflow visualization

Python optical flow visualization following Baker et al. (ICCV 2007) as used by the MPI-Sintel challenge

Stars: ✭ 183 (+258.82%)

Mutual labels: vision

Opencv

📷 Computer-Vision Demos

Stars: ✭ 244 (+378.43%)

Mutual labels: vision

Arucogen

Online ArUco markers generator

Stars: ✭ 155 (+203.92%)

Mutual labels: vision

Grocery-Product-Detection

This repository builds a product detection model to recognize products from grocery shelf images.

Stars: ✭ 73 (+43.14%)

Mutual labels: vision

Nextlevel

NextLevel was initally a weekend project that has now grown into a open community of camera platform enthusists. The software provides foundational components for managing media recording, camera interface customization, gestural interaction customization, and image streaming on iOS. The same capabilities can also be found in apps such as Snapchat, Instagram, and Vine.

Stars: ✭ 1,940 (+3703.92%)

Mutual labels: vision

Cs231a Notes

The course notes for Stanford's CS231A course on computer vision

Stars: ✭ 230 (+350.98%)

Mutual labels: vision

pybv

A lightweight I/O utility for the BrainVision data format, written in Python.

Stars: ✭ 18 (-64.71%)

Mutual labels: vision

frc-score-detection

A program to detect FRC match scores from their livestream.

Stars: ✭ 15 (-70.59%)

Mutual labels: vision

Expression-manipulator

ECCV'20 paper 'Toward Fine-grained Facial Expression Manipulation' code

Stars: ✭ 71 (+39.22%)

Mutual labels: eccv

View All Similar Projects ➔

Spatially Aware Multimodal Transformers for TextVQA

Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
Published at ECCV, 2020

Paper: arxiv.org/abs/2007.12146

Project Page: yashkant.github.io/projects/sam-textvqa

We propose a novel spatially aware self-attention layer such that each visual entity only looks at neighboring entities defined by a spatial graph and use it to solve TextVQA.

Repository Setup

Create a fresh conda environment, and install all dependencies.

conda create -n sam python=3.6
conda activate sam
cd sam-textvqa
pip install -r requirements.txt

Install pytorch

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Finally, install apex from: https://github.com/NVIDIA/apex

Data Setup

Download files from the dropbox link and place it in the data/ folder. Ensure that data paths match the directory structure provided in data/README.md

Run Experiments

From the below table pick the suitable configuration file:

Method	context (c)	Train splits	Evaluation Splits	Config File
SA-M4C	3	TextVQA	TextVQA	train-tvqa-eval-tvqa-c3.yml
SA-M4C	3	TextVQA + STVQA	TextVQA	train-tvqa_stvqa-eval-tvqa-c3.yml
SA-M4C	3	STVQA	STVQA	train-stvqa-eval-stvqa-c3.yml
SA-M4C	5	TextVQA	TextVQA	train-tvqa-eval-tvqa-c5.yml

To run the experiments use:

python train.py \
--config config.yml \
--tag experiment-name

To evaluate the pretrained checkpoint provided use:

python train.py \
--config configs/train-tvqa_stvqa-eval-tvqa-c3.yml \
--pretrained_eval data/pretrained-models/best_model.tar

Note: The beam-search evaluation is undergoing changes and will be updated.

Resources Used: We ran all the experiments on 2 Titan Xp gpus.

Citation

@inproceedings{kant2020spatially,
  title={Spatially Aware Multimodal Transformers for TextVQA},
  author={Kant, Yash and Batra, Dhruv and Anderson, Peter 
          and Schwing, Alexander and Parikh, Devi and Lu, Jiasen
          and Agrawal, Harsh},
  booktitle={ECCV}
  year={2020}}

Acknowledgements

Parts of this codebase were borrowed from the following repositories:

12-in-1: Multi-Task Vision and Language Representation Learning: Training Setup
MMF: A multimodal framework for vision and language research: Dataset processors and M4C model

We thank Abhishek Das, Abhinav Moudgil for their feedback and Ronghang Hu for sharing an early version of his work. The Georgia Tech effort was supported in part by NSF, AFRL, DARPA, ONR YIPs, ARO PECASE, Amazon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government, or any sponsor.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yashkant / sam-textvqa

Programming Languages

Labels

Projects that are alternatives of or similar to sam-textvqa

Spatially Aware Multimodal Transformers for TextVQA

Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
Published at ECCV, 2020

Repository Setup

Data Setup

Run Experiments

Citation

Acknowledgements

License

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yashkant / sam-textvqa

Programming Languages

Labels

Projects that are alternatives of or similar to sam-textvqa

Spatially Aware Multimodal Transformers for TextVQA

Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal Published at ECCV, 2020

Repository Setup

Data Setup

Run Experiments

Citation

Acknowledgements

License

Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
Published at ECCV, 2020