All Projects → epic-kitchens → Epic Kitchens 55 Action Models

epic-kitchens / Epic Kitchens 55 Action Models

Licence: other
EPIC-KITCHENS-55 baselines for Action Recognition

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Epic Kitchens 55 Action Models

iOS-CoreML-Inceptionv3
Real-time Object Recognition using Apple's CoreML 2.0 and Vision API -
Stars: ✭ 46 (-32.35%)
Mutual labels:  recognition, inception
M Pact
A one stop shop for all of your activity recognition needs.
Stars: ✭ 85 (+25%)
Mutual labels:  action-recognition, inception
TensorFlow-Powered Robot Vision
No description or website provided.
Stars: ✭ 34 (-50%)
Mutual labels:  recognition, inception
3d Densenet
3D Dense Connected Convolutional Network (3D-DenseNet for action recognition)
Stars: ✭ 118 (+73.53%)
Mutual labels:  action, recognition
DLCV2018SPRING
Deep Learning for Computer Vision (CommE 5052) in NTU
Stars: ✭ 38 (-44.12%)
Mutual labels:  recognition, action-recognition
Openimager
Image processing Toolkit in R
Stars: ✭ 45 (-33.82%)
Mutual labels:  recognition
Api server boilerplate
typescript express board boilerplate using routing controller
Stars: ✭ 52 (-23.53%)
Mutual labels:  action
Artyom.js
A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.
Stars: ✭ 1,011 (+1386.76%)
Mutual labels:  recognition
Dialectid e2e
End to End Dialect Identification using Convolutional Neural Network
Stars: ✭ 40 (-41.18%)
Mutual labels:  recognition
Fight detection
Real time Fight Detection Based on 2D Pose Estimation and RNN Action Recognition
Stars: ✭ 65 (-4.41%)
Mutual labels:  action-recognition
Cnn lstm ctc ocr for icpr
Forked from weinman/cnn_lstm_ctc_ocr for the ICPR MTWI 2018 challenge 1
Stars: ✭ 62 (-8.82%)
Mutual labels:  recognition
Training toolbox caffe
Training Toolbox for Caffe
Stars: ✭ 51 (-25%)
Mutual labels:  action-recognition
Processjs
Stars: ✭ 48 (-29.41%)
Mutual labels:  action
Rsync Deployments
GitHub Action for deploying code via rsync over ssh
Stars: ✭ 59 (-13.24%)
Mutual labels:  action
Naming Cheatsheet
Comprehensive language-agnostic guidelines on variables naming. Home of the A/HC/LC pattern.
Stars: ✭ 9,475 (+13833.82%)
Mutual labels:  action
File Extension List
Organised collection of common file extensions
Stars: ✭ 63 (-7.35%)
Mutual labels:  recognition
Github Action Get Previous Tag
Get the previous tag
Stars: ✭ 42 (-38.24%)
Mutual labels:  action
Skip Commit
GitHub Action: skip based on the last commit message containing a string
Stars: ✭ 51 (-25%)
Mutual labels:  action
Animatedpencil
Animated Pencil Action view for Android
Stars: ✭ 61 (-10.29%)
Mutual labels:  action
Resgcnv1
ResGCN: an efficient baseline for skeleton-based human action recognition.
Stars: ✭ 50 (-26.47%)
Mutual labels:  action-recognition

EPIC-KITCHENS-55 action recognition models

arXiv

This is a set of models trained for EPIC-KITCHENS-55 baselines. We support:

Many thanks to the authors of these repositories.

You can use the code provided here in one of two ways:

  1. PyTorch hub (recommended)
  2. Local installation

PyTorch Hub

PyTorch Hub is a way to easily share models with others. Using our models via hub is as simple as

import torch.hub
repo = 'epic-kitchens/action-models'

class_counts = (125, 352)
segment_count = 8
base_model = 'resnet50'
tsn = torch.hub.load(repo, 'TSN', class_counts, segment_count, 'RGB',
                     base_model=base_model, 
                     pretrained='epic-kitchens', force_reload=True)
trn = torch.hub.load(repo, 'TRN', class_counts, segment_count, 'RGB',
                     base_model=base_model, 
                     pretrained='epic-kitchens')
mtrn = torch.hub.load(repo, 'MTRN', class_counts, segment_count, 'RGB',
                     base_model=base_model, 
                      pretrained='epic-kitchens')
tsm = torch.hub.load(repo, 'TSM', class_counts, segment_count, 'RGB',
                     base_model=base_model, 
                     pretrained='epic-kitchens')

# Show all entrypoints and their help strings
for entrypoint in torch.hub.list(repo):
    print(entrypoint)
    print(torch.hub.help(repo, entrypoint))

batch_size = 1
segment_count = 8
snippet_length = 1  # Number of frames composing the snippet, 1 for RGB, 5 for optical flow
snippet_channels = 3  # Number of channels in a frame, 3 for RGB, 2 for optical flow
height, width = 224, 224

inputs = torch.randn(
    [batch_size, segment_count, snippet_length, snippet_channels, height, width]
)
# The segment and snippet length and channel dimensions are collapsed into the channel
# dimension
# Input shape: N x TC x H x W
inputs = inputs.reshape((batch_size, -1, height, width))
for model in [tsn, trn, mtrn, tsm]:
    # You can get features out of the models
    features = model.features(inputs)
    # and then classify those features
    verb_logits, noun_logits = model.logits(features)
    
    # or just call the object to classify inputs in a single forward pass
    verb_logits, noun_logits = model(inputs)
    print(verb_logits.shape, noun_logits.shape)

NOTE: We are dependent upon a fork of Remi Cadene's pretrained models that brings DataParallel support to PyTorch 1+. Install this with:

$ pip install git+https://github.com/wpwei/[email protected]_bug_fix

Local Installation

Models are available to downloaded from data.bris.ac.uk.

We provide an environment.yml file to create a conda environment. Sadly not all of the set up can be encapsulated in this file, so you have to perform some steps yourself (in the interest of eeking extra performance!)

$ conda env create -n epic-models -f environment.yml
$ conda activate epic-models

# The following steps are taken from
# https://docs.fast.ai/performance.html#installation

$ conda uninstall -y --force pillow pil jpeg libtiff
$ pip uninstall -y pillow pil jpeg libtiff
$ conda install -y -c conda-forge libjpeg-turbo
$ CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
$ conda install -y jpeg libtiff

NOTE: If the installation of pillow-simd fails, you can try installing GCC from conda-forge and trying the install again:

$ conda install -y gxx_linux-64
$ export CXX=x86_64-conda_cos6-linux-gnu-g++
$ export CC=x86_64-conda_cos6-linux-gnu-gcc
$ CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
$ conda install -y jpeg libtiff

If you install any new packages, check that pillow-simd hasn't be overwritten by an alternate pillow install by running:

$ python -c "from PIL import Image; print(Image.PILLOW_VERSION)"

You should see something like

6.0.0.post1

Pillow doesn't release with post suffixes, so if you have post in the version name, it's likely you have pillow-simd installed.

How to use the code

Check out demo.py for an example of how to construct the models and feed in data, or read on below for how to load checkpointed models.

Checkpoints

Checkpoints are saved as dictionaries with the following information:

  • model_type (str): Variant. Either 'tsm', 'tsm-nl', 'tsn', 'trn', or 'mtrn'
  • epoch (int): Last epoch completed in training
  • segment_count (int): Number of segments the network was trained with.
  • modality (str): Modality of the input. Either 'RGB' or 'Flow'
  • state_dict (dict): State dictionary of the network for use with model.load_state_dict
  • arch (str): Modality of network. Either 'BNInception' or 'resnet50'.
  • args (namespace): All the arguments used in training the network.

Some keys are only present depending on model type:

  • TSN:
    • consensus_type (str, TSN only): Consensus module variant for TSN. Either 'avg' or 'max'.
  • TSM:
    • shift_place (str, TSM only): Identifier for where the shift module is located. Either block or blockres.
    • shift_div (int, TSM only): The reciprocal of the proportion of channels used that are shifted.
    • temporal_pool (bool, TSM only): Whether gradual temporal pooling was used in this network.
    • non_local (bool, TSM only): Whether non-local blocks were added to this network.

To load checkpointed weights, first construct an instance of the network (using information stored in the checkpoint about the architecture set up), then call model.load_state_dict. For example:

from tsn import TSN
import torch

verb_class_count, noun_class_count = 125, 352
class_count = (verb_class_count, noun_class_count)
ckpt = torch.load('TSN_modality=RGB_segments=8_arch=resnet50.pth')
model = TSN(
    num_class=class_count,
    num_segments=ckpt['segment_count'],
    modality=ckpt['modality'],
    base_model=ckpt['arch'],
    dropout=ckpt['args'].dropout
)
model.load_state_dict(ckpt['state_dict'])

We provide some helpers functions for this purpose in model_loaders.py so you can simply load checkpoints like:

from model_loader import load_checkpoint
model = load_checkpoint('path/to/checkpoint.pth.tar')

Data Loading

We make use of the transforms available in the original TSN codebase. Providing you load your frames as a list of PIL.Image you can reuse the same data loading code as we use below. Note that you will have to populate the net and backbone_arch variables with the instantiation of the network and a string describing the name of the backbone architecture (e.g. 'resnet50' or 'BNInception').

from torchvision.transforms import Compose
from transforms import GroupScale, GroupCenterCrop, GroupOverSample, Stack, ToTorchFormatTensor, GroupNormalize

crop_count = 10
net = ...
backbone_arch = ...

if crop_count == 1:
    cropping = Compose([
        GroupScale(net.scale_size),
        GroupCenterCrop(net.input_size),
    ])
elif crop_count == 10:
    cropping = GroupOverSample(net.input_size, net.scale_size)
else:
    raise ValueError("Only 1 and 10 crop_count are supported while we got {}".format(crop_count))

transform = Compose([
    cropping,
    Stack(roll=backbone_arch == 'BNInception'),
    ToTorchFormatTensor(div=backbone_arch != 'BNInception'),
    GroupNormalize(net.input_mean, net.input_std),
])

Checkpoints

The checkpoints accompanying this repository score the following on the test set when using 10 crop evaluation.

Variant Arch Modality # Segments Seen [email protected] Seen [email protected] Seen [email protected] Unseen [email protected] Unseen [email protected] Unseen [email protected]
TSN BN-Inception RGB 8 47.97 38.85 22.39 36.46 22.64 22.39
TSN BN-Inception Flow 8 51.68 26.82 16.76 47.35 21.20 13.49
TRN BN-Inception RGB 8 58.26 36.32 25.46 47.29 22.91 15.06
TRN BN-Inception Flow 8 55.20 23.95 16.03 50.32 19.02 12.77
M-TRN BN-Inception RGB 8 55.76 37.94 26.62 45.41 23.90 15.57
M-TRN BN-Inception Flow 8 55.92 24.88 16.78 51.38 20.69 14.00
TSN ResNet-50 RGB 8 49.71 39.85 23.97 36.70 23.11 12.77
TSN ResNet-50 Flow 8 53.14 27.76 20.28 47.56 20.28 13.11
TRN ResNet-50 RGB 8 58.82 37.27 26.62 47.32 23.69 15.71
TRN ResNet-50 Flow 8 55.16 23.19 15.77 50.39 18.50 12.02
M-TRN ResNet-50 RGB 8 60.16 38.36 28.23 46.94 24.41 16.32
M-TRN ResNet-50 Flow 8 56.79 25.00 17.24 50.36 20.28 13.42
TSM ResNet-50 RGB 8 57.88 40.84 28.22 43.50 23.32 14.99
TSM ResNet-50 Flow 8 58.08 27.49 19.14 52.68 20.83 14.27

Extracting features

Classes include features and logits methods, mimicking the pretrainedmodels API. Simply create a model instance model = TSN(...) and call model.features(input) to obtain base-model features. To transform these to logits, call model.logits(features) where features is the tensor obtained from the previous step.

Utilities

You can have a look inside the checkpoints using python tools/print_checkpoint_details.py <path-to-checkpoint> to print checkpoint details including the model variant, number of segments, modality, architecture, and weight shapes.

Citation

If you find our code and trained models helpful, please kindly cite our work and dataset in addition to the authors of the models themselves (citation information for this is in the following section).

@article{price2019_EvaluationActionRecognition,
    title={An Evaluation of Action Recognition Models on EPIC-Kitchens},
    author={Price, Will and Damen, Dima},
    journal={arXiv preprint arXiv:1908.00867},
    archivePrefix={arXiv},
    eprint={1908.00867},
    year={2019},
    month="Aug"
}
@inproceedings{damen2018_ScalingEgocentricVision,
   title={Scaling Egocentric Vision: The EPIC-KITCHENS Dataset},
   author={Damen, Dima and Doughty, Hazel and Farinella, Giovanni Maria  and Fidler, Sanja and
           Furnari, Antonino and Kazakos, Evangelos and Moltisanti, Davide and Munro, Jonathan
           and Perrett, Toby and Price, Will and Wray, Michael},
   booktitle={European Conference on Computer Vision (ECCV)},
   year={2018}
}

Acknowledgements

We'd like to thank the academics and authors responsible for the following codebases that enabled this work.

If you make use of this repository, please cite their work as well as ours

TSN:

@InProceedings{wang2016_TemporalSegmentNetworks,
    title={Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
    author={Limin Wang and Yuanjun Xiong and Zhe Wang and Yu Qiao and Dahua Lin and
            Xiaoou Tang and Luc {Val Gool}},
    booktitle={The European Conference on Computer Vision (ECCV)},
    year={2016}
}

TRN:

@InProceedings{zhou2017_TemporalRelationalReasoning,
    title={Temporal Relational Reasoning in Videos},
    author={Zhou, Bolei and Andonian, Alex and Oliva, Aude and Torralba, Antonio},
    booktitle={The European Conference on Computer Vision (ECCV)},
    month={September},
    year={2018}
}

TSM:

@article{lin2018_TemporalShiftModule,
    title={Temporal Shift Module for Efficient Video Understanding},
    author={Lin, Ji and Gan, Chuang and Han, Song},
    journal={arXiv preprint arXiv:1811.08383},
    archivePrefix={arXiv},
    eprint={1811.08383},
    year={2018},
    month="Nov"
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].