All Projects → ekazakos → auditory-slow-fast

ekazakos / auditory-slow-fast

Licence: other
Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to auditory-slow-fast

temporal-binding-network
Implementation of "EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition, ICCV, 2019" in PyTorch
Stars: ✭ 95 (+106.52%)
Mutual labels:  convolutional-networks, action-recognition
DLCV2018SPRING
Deep Learning for Computer Vision (CommE 5052) in NTU
Stars: ✭ 38 (-17.39%)
Mutual labels:  action-recognition
TCFPN-ISBA
Temporal Convolutional Feature Pyramid Network (TCFPN) & Iterative Soft Boundary Assignment (ISBA), CVPR '18
Stars: ✭ 40 (-13.04%)
Mutual labels:  action-recognition
Robust-Deep-Learning-Pipeline
Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)
Stars: ✭ 20 (-56.52%)
Mutual labels:  action-recognition
theWorldInSafety
Surveillance System Against Violence
Stars: ✭ 31 (-32.61%)
Mutual labels:  action-recognition
TadTR
End-to-end Temporal Action Detection with Transformer. [Under review for a journal publication]
Stars: ✭ 55 (+19.57%)
Mutual labels:  action-recognition
adascan-public
Code for AdaScan: Adaptive Scan Pooling (CVPR 2017)
Stars: ✭ 43 (-6.52%)
Mutual labels:  action-recognition
vlog action recognition
Identifying Visible Actions in Lifestyle Vlogs
Stars: ✭ 13 (-71.74%)
Mutual labels:  action-recognition
ntu-x
NTU-X, which is an extended version of popular NTU dataset
Stars: ✭ 55 (+19.57%)
Mutual labels:  action-recognition
Tensorflow-For-Beginners
Introduction to deep learning with Tensorflow.
Stars: ✭ 55 (+19.57%)
Mutual labels:  convolutional-networks
two-stream-fusion-for-action-recognition-in-videos
No description or website provided.
Stars: ✭ 80 (+73.91%)
Mutual labels:  action-recognition
san
The official PyTorch implementation of "Context Matters: Self-Attention for sign Language Recognition"
Stars: ✭ 17 (-63.04%)
Mutual labels:  action-recognition
cpnet
Learning Video Representations from Correspondence Proposals (CVPR 2019 Oral)
Stars: ✭ 93 (+102.17%)
Mutual labels:  action-recognition
darkflow
Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
Stars: ✭ 5,986 (+12913.04%)
Mutual labels:  convolutional-networks
GST-video
ICCV 19 Grouped Spatial-Temporal Aggretation for Efficient Action Recognition
Stars: ✭ 40 (-13.04%)
Mutual labels:  action-recognition
synse-zsl
Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'
Stars: ✭ 14 (-69.57%)
Mutual labels:  action-recognition
Dataset-REPAIR
REPresentAtion bIas Removal (REPAIR) of datasets
Stars: ✭ 49 (+6.52%)
Mutual labels:  action-recognition
Paper Note
📚 记录一些自己读过的论文与笔记
Stars: ✭ 22 (-52.17%)
Mutual labels:  convolutional-networks
LSUV-keras
Simple implementation of the LSUV initialization in keras
Stars: ✭ 65 (+41.3%)
Mutual labels:  convolutional-networks
kinect-gesture
基于kinect 的人体 动作识别
Stars: ✭ 129 (+180.43%)
Mutual labels:  action-recognition

Auditory Slow-Fast

This repository implements the model proposed in the paper:

Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021

Project's webpage

[arXiv paper] [IEEE Xplore paper]

Citing

When using this code, kindly reference:

@ARTICLE{Kazakos2021SlowFastAuditory,
   title={Slow-Fast Auditory Streams For Audio Recognition},
   author={Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima},
           journal   = {CoRR},
           volume    = {abs/2103.03516},
           year      = {2021},
           ee        = {https://arxiv.org/abs/2103.03516},
}

Pretrained models

You can download our pretrained models on VGG-Sound and EPIC-KITCHENS-100:

  • Slow-Fast (EPIC-KITCHENS-100) link
  • Slow (EPIC-KITCHENS-100) link
  • Fast (EPIC-KITCHENS-100) link
  • Slow-Fast (VGG-Sound) link
  • Slow (VGG-Sound) link
  • Fast (VGG-Sound) link

Preparation

  • Requirements:
    • PyTorch 1.7.1
    • librosa: conda install -c conda-forge librosa
    • h5py: conda install h5py
    • wandb: pip install wandb
    • fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
    • simplejson: pip install simplejson
    • psutil: pip install psutil
    • tensorboard: pip install tensorboard
  • Add this repository to $PYTHONPATH.
export PYTHONPATH=/path/to/auditory-slow-fast/slowfast:$PYTHONPATH
  • VGG-Sound:
    1. Download the audio. For instructions see here
    2. Download train.pkl (link) and test.pkl (link). I converted the original train.csv and test.csv (found here) to pickle files with column names for easier use
  • EPIC-KITCHENS:
    1. From the annotation repository of EPIC-KITCHENS-100 (link), download: EPIC_100_train.pkl, EPIC_100_validation.pkl, and EPIC_100_test_timestamps.pkl. EPIC_100_train.pkl and EPIC_100_validation.pkl will be used for training/validation, while EPIC_100_test_timestamps.pkl can be used to obtain the scores to submit in the AR challenge.
    2. Download all the videos of EPIC-KITCHENS-100 using the download scripts found here, where you can also find detailed instructions on using the scripts.
    3. Extract audio from the videos by running:
    python audio_extraction/extract_audio.py /path/to/videos /output/path 
    
    1. Save audio in HDF5 format by running:
    python audio_extraction/wav_to_hdf5.py /path/to/audio /output/hdf5/EPIC-KITCHENS-100_audio.hdf5
    

Training/validation on EPIC-KITCHENS-100

To train the model run (fine-tuning from VGG-Sound pretrained model):

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To train from scratch remove TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model.

You can also train the individual streams. For example, for training Slow run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOW_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model

To validate the model run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

To obtain scores on the test set run:

python tools/run_net.py --cfg configs/EPIC-KITCHENS/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth 
EPICKITCHENS.TEST_LIST EPIC_100_test_timestamps.pkl EPICKITCHENS.TEST_SPLIT test

Training/validation on VGG-Sound

To train the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/output_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations 

To validate the model run:

python tools/run_net.py --cfg configs/VGG-Sound/SLOWFAST_R50.yaml NUM_GPUS num_gpus 
OUTPUT_DIR /path/to/experiment_dir VGGSOUND.AUDIO_DATA_DIR /path/to/dataset 
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations TRAIN.ENABLE False TEST.ENABLE True 
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth

License

The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].