All Projects → yrcong → STTran

yrcong / STTran

Licence: MIT license
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
Cuda
1817 projects
c
50402 projects - #5 most used programming language
cython
566 projects
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to STTran

3-D-Scene-Graph
3D scene graph generator implemented in Pytorch.
Stars: ✭ 52 (-53.98%)
Mutual labels:  scene-graph
MSRGCN
Official implementation of MSR-GCN (ICCV2021 paper)
Stars: ✭ 42 (-62.83%)
Mutual labels:  iccv2021
C5
Reference code for the paper "Cross-Camera Convolutional Color Constancy" (ICCV 2021)
Stars: ✭ 75 (-33.63%)
Mutual labels:  iccv2021
KERN
Code for Knowledge-Embedded Routing Network for Scene Graph Generation (CVPR 2019)
Stars: ✭ 99 (-12.39%)
Mutual labels:  scene-graph
SnowflakeNet
(TPAMI 2022) Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer
Stars: ✭ 74 (-34.51%)
Mutual labels:  iccv2021
Deep-Matching-Prior
Official implementation of deep matching prior
Stars: ✭ 21 (-81.42%)
Mutual labels:  iccv2021
Awesome-ICCV2021-Low-Level-Vision
A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation
Stars: ✭ 163 (+44.25%)
Mutual labels:  iccv2021
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (-56.64%)
Mutual labels:  iccv2021
LLVIP
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision
Stars: ✭ 438 (+287.61%)
Mutual labels:  iccv2021
proscene
Processing library for the creation of interactive scenes
Stars: ✭ 45 (-60.18%)
Mutual labels:  scene-graph
Webglstudio.js
A full open source 3D graphics editor in the browser, with scene editor, coding pad, graph editor, virtual file system, and many features more.
Stars: ✭ 4,508 (+3889.38%)
Mutual labels:  scene-graph
G-SFDA
code for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'
Stars: ✭ 88 (-22.12%)
Mutual labels:  iccv2021
InstanceRefer
[ICCV 2021] InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring
Stars: ✭ 64 (-43.36%)
Mutual labels:  iccv2021
NativeFX
Native Rendering integration for JavaFX (13 and beyond)
Stars: ✭ 125 (+10.62%)
Mutual labels:  scene-graph
Parametric-Contrastive-Learning
Parametric Contrastive Learning (ICCV2021)
Stars: ✭ 155 (+37.17%)
Mutual labels:  iccv2021
SceneGraphFusion
No description or website provided.
Stars: ✭ 82 (-27.43%)
Mutual labels:  scene-graph
SGGpoint
[CVPR 2021] Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis (official pytorch implementation)
Stars: ✭ 41 (-63.72%)
Mutual labels:  scene-graph
sg-risk-assessment
This repo includes the source code and dataset information for reproducing the results of our paper (https://arxiv.org/abs/2009.06435)
Stars: ✭ 35 (-69.03%)
Mutual labels:  scene-graph
ICCV2021-Single-Image-Desnowing-HDCWNet
This paper is accepted by ICCV 2021.
Stars: ✭ 47 (-58.41%)
Mutual labels:  iccv2021
flow1d
[ICCV 2021 Oral] High-Resolution Optical Flow from 1D Attention and Correlation
Stars: ✭ 91 (-19.47%)
Mutual labels:  iccv2021

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Graph Generation accepted by ICCV2021. We propose a Transformer-based model STTran to generate dynamic scene graphs of the given video. STTran can detect the visual relationships in each frame.

The introduction video is available now: https://youtu.be/gKpnRU8btLg

GitHub Logo

About the code We run the code on a single RTX2080ti for both training and testing. We borrowed some code from Yang's repository and Zellers' repository.

Requirements

  • python=3.6
  • pytorch=1.1
  • scipy=1.1.0
  • cypthon
  • dill
  • easydict
  • h5py
  • opencv
  • pandas
  • tqdm
  • yaml

Usage

We use python=3.6, pytorch=1.1 and torchvision=0.3 in our code. First, clone the repository:

git clone https://github.com/yrcong/STTran.git

We borrow some compiled code for bbox operations.

cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch We provide a pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

Dataset

We use the dataset Action Genome to train/evaluate our method. Please process the downloaded dataset with the Toolkit. The directories of the dataset should look like:

|-- action_genome
    |-- annotations   #gt annotations
    |-- frames        #sampled frames
    |-- videos        #original videos

In the experiments for SGCLS/SGDET, we only keep bounding boxes with short edges larger than 16 pixels. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader

Train

You can train the STTran with train.py. We trained the model on a RTX 2080ti:

  • For PredCLS:
python train.py -mode predcls -datasize large -data_path $DATAPATH 
  • For SGCLS:
python train.py -mode sgcls -datasize large -data_path $DATAPATH 
  • For SGDET:
python train.py -mode sgdet -datasize large -data_path $DATAPATH 

Evaluation

You can evaluate the STTran with test.py.

python test.py -m predcls -datasize large -data_path $DATAPATH -model_path $MODELPATH
python test.py -m sgcls -datasize large -data_path $DATAPATH -model_path $MODELPATH
python test.py -m sgdet -datasize large -data_path $DATAPATH -model_path $MODELPATH

Citation

If our work is helpful for your research, please cite our publication:

@inproceedings{cong2021spatial,
  title={Spatial-Temporal Transformer for Dynamic Scene Graph Generation},
  author={Cong, Yuren and Liao, Wentong and Ackermann, Hanno and Rosenhahn, Bodo and Yang, Michael Ying},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16372--16382},
  year={2021}
}

Help

When you have any question/idea about the code/paper. Please comment in Github or send us Email. We will reply as soon as possible.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].