All Projects → xlliu7 → TadTR

xlliu7 / TadTR

Licence: other
End-to-end Temporal Action Detection with Transformer. [Under review for a journal publication]

Programming Languages

python
139335 projects - #7 most used programming language
Cuda
1817 projects
C++
36643 projects - #6 most used programming language

Projects that are alternatives of or similar to TadTR

MUSES
[CVPR 2021] Multi-shot Temporal Event Localization: a Benchmark
Stars: ✭ 51 (-7.27%)
Mutual labels:  action-recognition, temporal-action-detection, temporal-action-localization
Mmaction
An open-source toolbox for action understanding based on PyTorch
Stars: ✭ 1,711 (+3010.91%)
Mutual labels:  action-recognition, temporal-action-detection, temporal-action-localization
Awesome-Weakly-Supervised-Temporal-Action-Localization
A curated publication list on weakly-supervised temporal action localization
Stars: ✭ 65 (+18.18%)
Mutual labels:  temporal-action-detection, temporal-action-localization
VideoTransformer-pytorch
PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
Stars: ✭ 159 (+189.09%)
Mutual labels:  transformer, action-recognition
golgotha
Contextualised Embeddings and Language Modelling using BERT and Friends using R
Stars: ✭ 39 (-29.09%)
Mutual labels:  transformer
transformer-models
Deep Learning Transformer models in MATLAB
Stars: ✭ 90 (+63.64%)
Mutual labels:  transformer
MinTL
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Stars: ✭ 61 (+10.91%)
Mutual labels:  transformer
Transformer Survey Study
"A survey of Transformer" paper study 👩🏻‍💻🧑🏻‍💻 KoreaUniv. DSBA Lab
Stars: ✭ 166 (+201.82%)
Mutual labels:  transformer
Embedding
Embedding模型代码和学习笔记总结
Stars: ✭ 25 (-54.55%)
Mutual labels:  transformer
Robust-Deep-Learning-Pipeline
Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)
Stars: ✭ 20 (-63.64%)
Mutual labels:  action-recognition
TS-CAM
Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.
Stars: ✭ 96 (+74.55%)
Mutual labels:  transformer
two-stream-fusion-for-action-recognition-in-videos
No description or website provided.
Stars: ✭ 80 (+45.45%)
Mutual labels:  action-recognition
FragmentVC
Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention
Stars: ✭ 134 (+143.64%)
Mutual labels:  transformer
text-style-transfer-benchmark
Text style transfer benchmark
Stars: ✭ 56 (+1.82%)
Mutual labels:  transformer
Learning-Lab-C-Library
This library provides a set of basic functions for different type of deep learning (and other) algorithms in C.This deep learning library will be constantly updated
Stars: ✭ 20 (-63.64%)
Mutual labels:  transformer
HRFormer
This is an official implementation of our NeurIPS 2021 paper "HRFormer: High-Resolution Transformer for Dense Prediction".
Stars: ✭ 357 (+549.09%)
Mutual labels:  transformer
RSTNet
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
Stars: ✭ 71 (+29.09%)
Mutual labels:  transformer
Walk-Transformer
From Random Walks to Transformer for Learning Node Embeddings (ECML-PKDD 2020) (In Pytorch and Tensorflow)
Stars: ✭ 26 (-52.73%)
Mutual labels:  transformer
text2keywords
Trained T5 and T5-large model for creating keywords from text
Stars: ✭ 53 (-3.64%)
Mutual labels:  transformer
deformer
[ACL 2020] DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Stars: ✭ 111 (+101.82%)
Mutual labels:  transformer

TadTR: End-to-end Temporal Action Detection with Transformer

By Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Shiwei Zhang, Song Bai, Xiang Bai.

This repo holds the code for TadTR, described in the technical report: End-to-end temporal action detection with Transformer

The tech report is out-dated. We have significantly improved TadTR since we uploaded it to arxiv. It achives much better performance now. We'll update the arxiv version recently.

We have also explored fully end-to-end training from RGB images with TadTR. See our CVPR 2022 work E2E-TAD.

Introduction

TadTR is an end-to-end Temporal Action Detection TRansformer. It has the following advantages over previous methods:

  • Simple. It adopts a set-prediction pipeline and achieves TAD with a single network. It does not require a separate proposal generation stage.
  • Flexible. It removes hand-crafted design such as anchor setting and NMS.
  • Sparse. It produces very sparse detections (e.g. 10 on ActivityNet), thus requiring lower computation cost.
  • Strong. As a self-contained temporal action detector, TadTR achieves state-of-the-art performance on HACS and THUMOS14. It is also much stronger than concurrent Transformer-based methods such as RTD-Net and AGT.

Updates

[2022.3] Our new work E2E-TAD based on TadTR is accepted to CVPR 2022. It supports fully end-to-end training from RGB images.

[2021.9.15] Update the performance on THUMOS14.

[2021.9.1] Add demo code.

TODOs

  • add model code
  • add inference code
  • add training code
  • support training/inference with video input

Main Results

  • HACS Segments
Method Feature [email protected] [email protected] [email protected] Avg. mAP Model
TadTR I3D RGB 47.14 32.11 10.94 32.09 [OneDrive]
  • THUMOS14
Method Feature [email protected] [email protected] [email protected] [email protected] [email protected] Avg. mAP Model
TadTR I3D 2stream 74.8 69.1 60.1 46.6 32.8 56.7 [OneDrive]
  • ActivityNet-1.3
Method Feature [email protected] [email protected] [email protected] Avg. mAP Model
TadTR TSN 2stream 51.29 34.99 9.49 34.64 [OneDrive]
TadTR TSP 53.62 37.52 10.56 36.75 [OneDrive]

Install

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

  • Other requirements

    pip install -r requirements.txt

Compiling CUDA extensions

cd model/ops;

# If you have multiple installations of CUDA Toolkits, you'd better add a prefix
# CUDA_HOME=<your_cuda_toolkit_path> to specify the correct version. 
python setup.py build_ext --inplace

Run a quick test

python demo.py

Data Preparation

To be updated.

Training

Run the following command

bash scripts/train.sh DATASET

Testing

bash scripts/test.sh DATASET WEIGHTS

Acknowledgement

The code is based on the DETR and Deformable DETR. We also borrow the implementation of the RoIAlign1D from G-TAD. Thanks for their great works.

Citing

@article{liu2021end,
  title={End-to-end Temporal Action Detection with Transformer},
  author={Liu, Xiaolong and Wang, Qimeng and Hu, Yao and Tang, Xu and Bai, Song and Bai, Xiang},
  journal={arXiv preprint arXiv:2106.10271},
  year={2021}
}

Contact

For questions and suggestions, please contact Xiaolong Liu at "liuxl at hust dot edu dot cn".

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].