All Projects → 17Skye17 → VideoLT

17Skye17 / VideoLT

Licence: other
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to VideoLT

Parametric-Contrastive-Learning
Parametric Contrastive Learning (ICCV2021)
Stars: ✭ 155 (+496.15%)
Mutual labels:  long-tailed-recognition
uniformer-pytorch
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, debuted in ICLR 2022
Stars: ✭ 90 (+246.15%)
Mutual labels:  video-classification
conv3d-video-action-recognition
My experimentation around action recognition in videos. Contains Keras implementation for C3D network based on original paper "Learning Spatiotemporal Features with 3D Convolutional Networks", Tran et al. and it includes video processing pipelines coded using mPyPl package. Model is being benchmarked on popular UCF101 dataset and achieves result…
Stars: ✭ 50 (+92.31%)
Mutual labels:  video-classification
UniFormer
[ICLR2022] official implementation of UniFormer
Stars: ✭ 574 (+2107.69%)
Mutual labels:  video-classification
C3D-tensorflow
Action recognition with C3D network implemented in tensorflow
Stars: ✭ 34 (+30.77%)
Mutual labels:  video-classification
MiCT-Net-PyTorch
Video Recognition using Mixed Convolutional Tube (MiCT) on PyTorch with a ResNet backbone
Stars: ✭ 48 (+84.62%)
Mutual labels:  video-classification
STAM-pytorch
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification
Stars: ✭ 109 (+319.23%)
Mutual labels:  video-classification
ResLT
ResLT: Residual Learning for Long-tailed Recognition (TPAMI 2022)
Stars: ✭ 40 (+53.85%)
Mutual labels:  long-tailed-recognition
TA3N
[ICCV 2019 Oral] TA3N: https://github.com/cmhungsteve/TA3N (Most updated repo)
Stars: ✭ 45 (+73.08%)
Mutual labels:  video-classification
GST-video
ICCV 19 Grouped Spatial-Temporal Aggretation for Efficient Action Recognition
Stars: ✭ 40 (+53.85%)
Mutual labels:  video-classification
cpnet
Learning Video Representations from Correspondence Proposals (CVPR 2019 Oral)
Stars: ✭ 93 (+257.69%)
Mutual labels:  video-classification
keras-deep-learning
Various implementations and projects on CNN, RNN, LSTM, GAN, etc
Stars: ✭ 22 (-15.38%)
Mutual labels:  video-classification
epic-kitchens-55-starter-kit-action-recognition
🌱 Starter kit for working with the EPIC-KITCHENS-55 dataset for action recognition or anticipation
Stars: ✭ 40 (+53.85%)
Mutual labels:  video-dataset
Awesome-of-Long-Tailed-Recognition
A curated list of long-tailed recognition resources.
Stars: ✭ 456 (+1653.85%)
Mutual labels:  long-tailed-recognition
MiSLAS
Improving Calibration for Long-Tailed Recognition (CVPR2021)
Stars: ✭ 94 (+261.54%)
Mutual labels:  long-tailed-recognition
TailCalibX
Pytorch implementation of Feature Generation for Long-Tail Classification by Rahul Vigneswaran, Marc T Law, Vineeth N Balasubramaniam and Makarand Tapaswi
Stars: ✭ 32 (+23.08%)
Mutual labels:  long-tailed-recognition

Pytorch Code for VideoLT

[Website][Paper]

Updates

  • [01/14/2022] Raw videos uploaded to Google Drive, for access please send us an e-mail: zxwu at fudan.edu.cn
  • [10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zxwu at fudan.edu.cn
  • [09/28/2021] Features uploaded to Aliyun Drive(deprecated), for access please send us an e-mail: zxwu at fudan.edu.cn
  • [08/23/2021] Checkpoint links uploaded, sorry we are handling campus network bandwidth limitation, dataset will be released in this weeek.
  • [08/15/2021] Code released. Dataset download links and checkpoints links will be updated in a week.
  • [07/29/2021] Dataset released, visit https://videolt.github.io/ for downloading.
  • [07/23/2021] VideoLT is accepted by ICCV2021.

concept

Overview

VideoLT is a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition. We provide VideoLT dataset and long-tailed baselines in this repo including:

Data Preparation

Please be aware that VideoLT is only for non-commercial use, please send us an e-mail: zxwu at fudan.edu.cn and agree to our license, then we will send back the download links to you. We provide raw videos(~1.7TB) and extracted features(~900GB in total, ~295GB for each).

To decompress the .tar.gz files, please use commands:

cat TSM-R50-feature.tar.gz.part* | tar zx 
cat ResNet50-feature.tar.gz.part* | tar zx
cat ResNet101-feature.tar.gz.part* | tar zx

For using extracted features, please modify dataset/dutils.py and set the correct path to features.

Model Zoo

The baseline scripts and checkpoints are provided in MODELZOO.md.

FrameStack

FrameStack is simple yet effective approach for long-tailed video recognition which re-samples training data at the frame level and adopts a dynamic sampling strategy based on knowledge learned by the network. The rationale behind FrameStack is to dynamically sample more frames from videos in tail classes and use fewer frames for those from head classes.

framestack

Usage

Requirement

pip install -r requirements.txt

Prepare Data Path

  1. Modify FEATURE_NAME, PATH_TO_FEATURE and FEATURE_DIM in dataset/dutils.py.

  2. Set ROOT in dataset/dutils.py to labels folder. The directory structure is:

    labels
    |-- count-labels-train.lst
    |-- test.lst
    |-- test_videofolder.txt
    |-- train.lst
    |-- train_videofolder.txt
    |-- val_videofolder.txt
    `-- validate.lst

Train

We provide scripts for training. Please refer to MODELZOO.md.

Example training scripts:

FEATURE_NAME='ResNet101'

export CUDA_VISIBLE_DEVICES='2'
python base_main.py  \
     --augment "mixup" \
     --feature_name $FEATURE_NAME \
     --lr 0.0001 \
     --gd 20 --lr_steps 30 60 --epochs 100 \
     --batch-size 128 -j 16 \
     --eval-freq 5 \
     --print-freq 20 \
     --root_log=$FEATURE_NAME-log \
     --root_model=$FEATURE_NAME'-checkpoints' \
     --store_name=$FEATURE_NAME'_bs128_lr0.0001_lateavg_mixup' \
     --num_class=1004 \
     --model_name=NonlinearClassifier \
     --train_num_frames=60 \
     --val_num_frames=150 \
     --loss_func=BCELoss \

Note: Set args.resample, args.augment and args.loss_func can apply multiple long-tailed stratigies.

Options:

    args.resample: ['None', 'CBS','SRS']
    args.augment : ['None', 'mixup', 'FrameStack']
    args.loss_func: ['BCELoss', 'LDAM', 'EQL', 'CBLoss', 'FocalLoss']

Test

We provide scripts for testing in scripts. Modify CKPT to saved checkpoints.

Example testing scripts:

FEATURE_NAME='ResNet101'
CKPT='VideoLT_checkpoints/ResNet-101/ResNet101_bs128_lr0.0001_lateavg_mixup/ckpt.best.pth.tar'

export CUDA_VISIBLE_DEVICES='1'
python base_test.py \
     --resume $CKPT \
     --feature_name $FEATURE_NAME \
     --batch-size 128 -j 16 \
     --print-freq 20 \
     --num_class=1004 \
     --model_name=NonlinearClassifier \
     --train_num_frames=60 \
     --val_num_frames=150 \
     --loss_func=BCELoss \

Citing

If you find VideoLT helpful for your research, please consider citing:

@InProceedings{Zhang_2021_ICCV,
    author    = {Zhang, Xing and Wu, Zuxuan and Weng, Zejia and Fu, Huazhu and Chen, Jingjing and Jiang, Yu-Gang and Davis, Larry S.},
    title     = {VideoLT: Large-Scale Long-Tailed Video Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {7960-7969}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].