All Projects → xlliu7 → MUSES

xlliu7 / MUSES

Licence: other
[CVPR 2021] Multi-shot Temporal Event Localization: a Benchmark

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to MUSES

Mmaction
An open-source toolbox for action understanding based on PyTorch
Stars: ✭ 1,711 (+3254.9%)
Mutual labels:  action-recognition, temporal-action-detection, temporal-action-localization
TadTR
End-to-end Temporal Action Detection with Transformer. [Under review for a journal publication]
Stars: ✭ 55 (+7.84%)
Mutual labels:  action-recognition, temporal-action-detection, temporal-action-localization
Awesome-Weakly-Supervised-Temporal-Action-Localization
A curated publication list on weakly-supervised temporal action localization
Stars: ✭ 65 (+27.45%)
Mutual labels:  temporal-action-detection, temporal-action-localization
Ta3n
[ICCV 2019 (Oral)] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation (PyTorch)
Stars: ✭ 217 (+325.49%)
Mutual labels:  action-recognition
Ican
[BMVC 2018] iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection
Stars: ✭ 225 (+341.18%)
Mutual labels:  action-recognition
temporal-ssl
Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.
Stars: ✭ 46 (-9.8%)
Mutual labels:  action-recognition
bLVNet-TAM
The official Codes for NeurIPS 2019 paper. Quanfu Fan, Ricarhd Chen, Hilde Kuehne, Marco Pistoia, David Cox, "More Is Less: Learning Efficient Video Representations by Temporal Aggregation Modules"
Stars: ✭ 54 (+5.88%)
Mutual labels:  action-recognition
Ig65m Pytorch
PyTorch 3D video classification models pre-trained on 65 million Instagram videos
Stars: ✭ 217 (+325.49%)
Mutual labels:  action-recognition
weakly-action-localization
No description or website provided.
Stars: ✭ 30 (-41.18%)
Mutual labels:  action-recognition
TA3N
[ICCV 2019 Oral] TA3N: https://github.com/cmhungsteve/TA3N (Most updated repo)
Stars: ✭ 45 (-11.76%)
Mutual labels:  action-recognition
Attentionalpoolingaction
Code/Model release for NIPS 2017 paper "Attentional Pooling for Action Recognition"
Stars: ✭ 248 (+386.27%)
Mutual labels:  action-recognition
Action recognition zoo
Codes for popular action recognition models, verified on the something-something data set.
Stars: ✭ 227 (+345.1%)
Mutual labels:  action-recognition
temporal-binding-network
Implementation of "EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition, ICCV, 2019" in PyTorch
Stars: ✭ 95 (+86.27%)
Mutual labels:  action-recognition
Paddlevideo
Comprehensive, latest, and deployable video deep learning algorithm, including video recognition, action localization, and temporal action detection tasks. It's a high-performance, light-weight codebase provides practical models for video understanding research and application
Stars: ✭ 218 (+327.45%)
Mutual labels:  action-recognition
MiCT-Net-PyTorch
Video Recognition using Mixed Convolutional Tube (MiCT) on PyTorch with a ResNet backbone
Stars: ✭ 48 (-5.88%)
Mutual labels:  action-recognition
Actionvlad
ActionVLAD for video action classification (CVPR 2017)
Stars: ✭ 217 (+325.49%)
Mutual labels:  action-recognition
two-stream-action-recognition-keras
Two-stream CNNs for video action recognition implemented in Keras
Stars: ✭ 116 (+127.45%)
Mutual labels:  action-recognition
Lintel
A Python module to decode video frames directly, using the FFmpeg C API.
Stars: ✭ 240 (+370.59%)
Mutual labels:  action-recognition
Alphaction
Spatio-Temporal Action Localization System
Stars: ✭ 221 (+333.33%)
Mutual labels:  action-recognition
sparseprop
Temporal action proposals
Stars: ✭ 46 (-9.8%)
Mutual labels:  action-recognition

MUSES

PWC

This repo holds the code and the models for MUSES, introduced in the paper:
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu, Yao Hu, Song Bai, Fei Ding, Xiang Bai, Philip H.S. Torr
CVPR 2021.

MUSES is a large-scale video dataset, designed to spur researches on a new task called multi-shot temporal event localization. We present a baseline aproach (denoted as MUSES-Net) that achieves SOTA performance on MUSES. It also reports an mAP of 56.9% on THUMOS14 at IoU=0.5.

The code largely borrows from SSN and P-GCN. Thanks for their great work!

Find more resouces (e.g. annotation file, source videos) on our project page.

Updates

[2022.3.19] Add support for the MUSES dataset. The proposals, models, source videos of the MUSES dataset are released. Stay tuned for MUSES v2, which includes videos from more countries.
[2021.6.19] Code and the annotation file of MUSES are released. Please find the annotation file on our project page.

Contents



Usage Guide

Prerequisites

[back to top]

The code is based on PyTorch. The following environment is required.

Other minor Python modules can be installed by running

pip install -r requirements.txt

The code relies on CUDA extensions. Build them with the following command:

python setup.py develop

After installing all dependecies, run python demo.py for a quick test.

Data Preparation

[back to top]

We support experimenting with THUMOS14 and MUSES. The video features, the proposals and the reference models are provided on OneDrive.

Features and Proposals

  • THUMOS14: The features and the proposals are the same as thosed used by PGCN. Extract the archive thumos_i3d_features.tar and put the features in data/thumos14 folder. The proposal files are already contained in the repository. We expect the following structure in this folder.

    - data
      - thumos14
        - I3D_RGB
        - I3D_Flow
    
  • MUSES: Extract the archives of features and proposal files.

    # The archive does not have a directory structure
    # We need to create one
    mkdir -p data/muses/muses_i3d_features
    tar -xf muses_i3d_features.tar -C data/muses/muses_i3d_features
    tar -xf muses_proposals.tar -C data/muses

    We expect the following structure in this folder.

    - data
      - muses
        - muses_i3d_features
        - muses_test_proposal_list.txt
        - ...
    

You can also specify the path to the features/proposals in the config files data/cfgs/*.yml.

Reference Models

Put the reference_models folder in the root directory of this code:

 - reference_models
   - muses.pth.tar
   - thumos14_flow.pth.tar
   - thumos14_rgb.pth.tar

Testing Trained Models

[back to top]

You can test the reference models by running a single script

bash scripts/test_reference_models.sh DATASET

Here DATASET should be thumos14 or muses.

Using these models, you should get the following performance

MUSES

0.3 0.4 0.5 0.6 0.7 Average
mAP 26.5 23.1 19.7 14.8 9.5 18.7

Note: We re-train the network on MUSES and the performance is higher than that reported in the paper.

THUMOS14

Modality 0.3 0.4 0.5 0.6 0.7 Average
RGB 60.14 54.93 46.38 34.96 21.69 43.62
Flow 64.64 60.29 53.93 42.84 29.70 50.28
R+F 68.93 63.99 56.85 46.25 30.97 53.40

The testing process consists of two steps, detailed below.

  1. Extract detection scores for all the proposals by running
python test_net.py DATASET CHECKPOINT_PATH RESULT_PICKLE --cfg CFG_PATH

Here, RESULT_PICKLE is the path where we save the detection scores. CFG_PATH is the path of config file, e.g. data/cfgs/thumos14_flow.yml.

  1. Evaluate the detection performance by running
python eval.py DATASET RESULT_PICKLE --cfg CFG_PATH

On THUMOS14, we need to fuse the detection scores with RGB and Flow modality. This can be done by running

python eval.py DATASET RESULT_PICKLE_RGB RESULT_PICKLE_FLOW --cfg CFG_PATH --score_weights 1 1.2 --cfg CFG_PATH_RGB

Training

[back to top]

Train your own models with the following command

python train_net.py  DATASET  --cfg CFG_PATH --snapshot_pref SNAPSHOT_PREF --epochs MAX_EPOCHS

SNAPSHOT_PREF: the path to save trained models and logs, e.g outputs/snapshpts/thumos14_rgb/.

We provide a script that finishes all steps, including training, testing, and two-stream fusion. Run the script with the following command

bash scripts/do_all.sh DATASET

Note: The results may vary in different runs and differs from those of the reference models. It is encouraged to use the average mAP as the primary metric. It is more stable than [email protected].

Citation

Please cite the following paper if you feel MUSES useful to your research

@InProceedings{Liu_2021_CVPR,
    author    = {Liu, Xiaolong and Hu, Yao and Bai, Song and Ding, Fei and Bai, Xiang and Torr, Philip H. S.},
    title     = {Multi-Shot Temporal Event Localization: A Benchmark},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {12596-12606}
}

Related Projects

  • TadTR: Efficient temporal action detectioon (localization) with Transformer.

Contact

[back to top]

For questions and suggestions, file an issue or contact Xiaolong Liu at "liuxl at hust dot edu dot cn".

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].