All Projects → rowanz → R2c

rowanz / R2c

Licence: mit
Recognition to Cognition Networks (code for the model in "From Recognition to Cognition: Visual Commonsense Reasoning", CVPR 2019)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to R2c

Keylogger
Keylogger is 100% invisible keylogger not only for users, but also undetectable by antivirus software. Blackcat keylogger Monitors all keystokes, Mouse clicks. It has a seperate process which continues capture system screenshot and send to ftp server in given time.
Stars: ✭ 271 (-30.69%)
Mutual labels:  visual
Enso Archive
Looking for Enso, the visual programming language? ➡️ https://github.com/enso-org/enso
Stars: ✭ 305 (-21.99%)
Mutual labels:  visual
Pythonfromspace
Python Examples for Remote Sensing
Stars: ✭ 344 (-12.02%)
Mutual labels:  vision
Dirt
DIRT: a fast differentiable renderer for TensorFlow
Stars: ✭ 273 (-30.18%)
Mutual labels:  vision
Imagedetect
✂️ Detect and crop faces, barcodes and texts in image with iOS 11 Vision api.
Stars: ✭ 286 (-26.85%)
Mutual labels:  vision
Grip
Program for rapidly developing computer vision applications
Stars: ✭ 314 (-19.69%)
Mutual labels:  vision
Serverlessbydesign
A visual approach to serverless development. Think. Build. Repeat.
Stars: ✭ 254 (-35.04%)
Mutual labels:  visual
Home Platform
HoME: a Household Multimodal Environment is a platform for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context.
Stars: ✭ 370 (-5.37%)
Mutual labels:  vision
Awesome Deep Vision Web Demo
A curated list of awesome deep vision web demo
Stars: ✭ 298 (-23.79%)
Mutual labels:  vision
Mahapps.metro.simplechildwindow
A simple child window for MahApps.Metro
Stars: ✭ 339 (-13.3%)
Mutual labels:  visual
Dest
🐼 One Millisecond Deformable Shape Tracking Library (DEST)
Stars: ✭ 276 (-29.41%)
Mutual labels:  vision
Apc Vision Toolbox
MIT-Princeton Vision Toolbox for the Amazon Picking Challenge 2016 - RGB-D ConvNet-based object segmentation and 6D object pose estimation.
Stars: ✭ 277 (-29.16%)
Mutual labels:  vision
Ios 11 By Examples
👨🏻‍💻 Examples of new iOS 11 APIs
Stars: ✭ 3,327 (+750.9%)
Mutual labels:  vision
Facesvisiondemo
👀 iOS11 demo application for age and gender classification of facial images.
Stars: ✭ 273 (-30.18%)
Mutual labels:  vision
Monogatari
Monogatari is a simple web visual novel engine, created to bring Visual Novels to the web.
Stars: ✭ 357 (-8.7%)
Mutual labels:  visual
Modules.tf Lambda
Infrastructure as code generator - from visual diagrams created with Cloudcraft.co to Terraform
Stars: ✭ 267 (-31.71%)
Mutual labels:  visual
Salgan
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
Stars: ✭ 314 (-19.69%)
Mutual labels:  visual
Hedron
Perform live shows with your three.js creations
Stars: ✭ 372 (-4.86%)
Mutual labels:  visual
Multi sensor fusion
Multi-Sensor Fusion (GNSS, IMU, Camera) 多源多传感器融合定位 GPS/INS组合导航 PPP/INS紧组合
Stars: ✭ 357 (-8.7%)
Mutual labels:  vision
Eyeloop
EyeLoop is a Python 3-based eye-tracker tailored specifically to dynamic, closed-loop experiments on consumer-grade hardware.
Stars: ✭ 336 (-14.07%)
Mutual labels:  visual

From Recognition to Cognition: Visual Commonsense Reasoning (cvpr 2019 oral)

This repository contains data and PyTorch code for the paper From Recognition to Cognition: Visual Commonsense Reasoning (arxiv). For more info, check out the project page at visualcommonsense.com. For updates, or to ask for help, check out and join our google group!!

visualization

This repo should be ready to replicate my results from the paper. If you have any issues with getting it set up though, please file a github issue. Still, the paper is just an arxiv version, so there might be more updates in the future. I'm super excited about VCR but it should be viewed as knowledge that's still in the making :)

Background as to the Recognition to Cognition model

This repository is for the new task of Visual Commonsense Reasoning. A model is given an image, objects, a question, and four answer choices. The model has to decide which answer choice is correct. Then, it's given four rationale choices, and it has to decide which of those is the best rationale that explains why its answer is right.

In particular, I have code and checkpoints for the Recognition to Cognition (R2C) model, as discussed in the paper VCR paper. Here's a diagram that explains what's going on:

modelfig

We'll treat going from Q->A and QA->R as two separate tasks: in each, the model is given a 'query' (question, or question+answer) and 'response choices' (answer, or rationale). Essentially, we'll use BERT and detection regions to ground the words in the query, then contextualize the query with the response. We'll perform several steps of reasoning on top of a representation consisting of the response choice in question, the attended query, and the attended detection regions. See the paper for more details.

What this repo has / doesn't have

I have code and checkpoints for replicating my R2C results. You might find the dataloader useful (in dataloaders/vcr.py), as it handles loading the data in a nice way using the allennlp library. You can submit to the leaderboard using my script in models/eval_for_leaderboard.py

You can train a model using models/train.py. This also has code to obtain model predictions. Use models/eval_q2ar.py to get validation results combining Q->A and QA->R components.

Setting up and using the repo

  1. Get the dataset. Follow the steps in data/README.md. This includes the steps to get the pretrained BERT embeddings. Note (as of Jan 23rd) you'll need to re-download the test embeddings if you downloaded them before, as there was a bug in the version I had uploaded (essentially the 'anonymized' code didn't condition on the right context).

  2. Install cuda 9.0 if it's not available already. You might want to follow this this guide but using cuda 9.0. I use the following commands (my OS is ubuntu 16.04):

wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
chmod +x cuda_9.0.176_384.81_linux-run
./cuda_9.0.176_384.81_linux-run --extract=$HOME
sudo ./cuda-linux.9.0.176-22781540.run
sudo ln -s /usr/local/cuda-9.0/ /usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/
  1. Install anaconda if it's not available already, and create a new environment. You need to install a few things, namely, pytorch 1.0, torchvision (from the layers branch, which has ROI pooling), and allennlp.
wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
conda update -n base -c defaults conda
conda create --name r2c python=3.6
source activate r2c

conda install numpy pyyaml setuptools cmake cffi tqdm pyyaml scipy ipython mkl mkl-include cython typing h5py pandas nltk spacy numpydoc scikit-learn jpeg

conda install pytorch cudatoolkit=9.0 -c pytorch
pip install git+git://github.com/pytorch/[email protected]

pip install -r allennlp-requirements.txt
pip install --no-deps allennlp==0.8.0
python -m spacy download en_core_web_sm


# this one is optional but it should help make things faster
pip uninstall pillow && CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
  1. If you don't want to download from scratch, then download my checkpoint.
wget https://s3-us-west-2.amazonaws.com/ai2-rowanz/r2c/flagship_answer/best.th -P models/saves/flagship_answer/
wget https://s3-us-west-2.amazonaws.com/ai2-rowanz/r2c/flagship_rationale/best.th -P models/saves/flagship_rationale/
  1. That's it! Now to set up the environment, run source activate r2c && export PYTHONPATH=/home/rowan/code/r2c (or wherever you have this directory).

help

Feel free to open an issue if you encounter trouble getting it to work! Or, post in the google group.

Bibtex

@inproceedings{zellers2019vcr,
    author = {Zellers, Rowan and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin},
    title = {From Recognition to Cognition: Visual Commonsense Reasoning},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].