All Projects → datamllab → autovideo

datamllab / autovideo

Licence: MIT license
AutoVideo: An Automated Video Action Recognition System

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to autovideo

maggy
Distribution transparent Machine Learning experiments on Apache Spark
Stars: ✭ 83 (-67.06%)
Mutual labels:  automl
deep autoviml
Build tensorflow keras model pipelines in a single line of code. Now with mlflow tracking. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.
Stars: ✭ 98 (-61.11%)
Mutual labels:  automl
VideoRecognition-realtime-autotrainer-alerts
State of the art object detection in real-time using YOLOV3 algorithm. Augmented with a process that allows easy training of the classifier as a plug & play solution . Provides alert if an item in an alert list is detected.
Stars: ✭ 36 (-85.71%)
Mutual labels:  video-recognition
automated-readability
Formula to detect ease of reading according to the Automated Readability Index (1967)
Stars: ✭ 46 (-81.75%)
Mutual labels:  automated
clara-train-examples
Example notebooks demonstrating how to use Clara Train to build Medical Imaging Deep Learning models
Stars: ✭ 80 (-68.25%)
Mutual labels:  automl
emacs
Nightly custom Emacs builds for macOS Nix environments
Stars: ✭ 25 (-90.08%)
Mutual labels:  automated
tsfuse
Python package for automatically constructing features from multiple time series
Stars: ✭ 33 (-86.9%)
Mutual labels:  automl
ultraopt
Distributed Asynchronous Hyperparameter Optimization better than HyperOpt. 比HyperOpt更强的分布式异步超参优化库。
Stars: ✭ 93 (-63.1%)
Mutual labels:  automl
BossNAS
(ICCV 2021) BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
Stars: ✭ 125 (-50.4%)
Mutual labels:  automl
OptGBM
Optuna + LightGBM = OptGBM
Stars: ✭ 27 (-89.29%)
Mutual labels:  automl
comfy-channel
A 24/7 live video broadcast with automatic content selection and overlays using FFMPEG and Python!
Stars: ✭ 37 (-85.32%)
Mutual labels:  automated
winup
Automate a Windows 10 VM setup for coding and testing
Stars: ✭ 21 (-91.67%)
Mutual labels:  automated
removd
Automatic ai cut outs of people, products and cars with https://www.remove.bg service
Stars: ✭ 28 (-88.89%)
Mutual labels:  automated
pymfe
Python Meta-Feature Extractor package.
Stars: ✭ 89 (-64.68%)
Mutual labels:  automl
autogbt-alt
An experimental Python package that reimplements AutoGBT using LightGBM and Optuna.
Stars: ✭ 76 (-69.84%)
Mutual labels:  automl
Awesome-Tensorflow2
基于Tensorflow2开发的优秀扩展包及项目
Stars: ✭ 45 (-82.14%)
Mutual labels:  automl
Deep-learning-And-Paper
【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等
Stars: ✭ 62 (-75.4%)
Mutual labels:  automl
aikit
Automated machine learning package
Stars: ✭ 24 (-90.48%)
Mutual labels:  automl
EasyRec
A framework for large scale recommendation algorithms.
Stars: ✭ 599 (+137.7%)
Mutual labels:  automl
human-in-the-loop-machine-learning-tool-tornado
Tornado is a human-in-the-loop machine learning framework that helps you exploit your unlabelled data to train models through a simple and easy to use web interface.
Stars: ✭ 37 (-85.32%)
Mutual labels:  automl

AutoVideo: An Automated Video Action Recognition System

Logo

Testing PyPI version Downloads Downloads License: MIT

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video action recognition, supporting a complete training pipeline consisting of data processing, video processing, video transformation, and action recognition. It also supports automated tuners for pipeline search. AutoVideo is developed by DATA Lab at Rice University.

There are some other video analysis libraries out there, but this one is designed to be highly modular. AutoVideo is highly extendible thanks to the pipeline language, where each module is wrapped as a primitive with some hyperparameters. This allows us to easily develop new modules. It is also convenient to perform pipeline search. We welcome contributions to enrich AutoVideo with more primitives. You can find instructions in Contributing Guide.

Demo

Overview

Cite this work

If you find this repo useful, you may cite:

Zha, Daochen, et al. "AutoVideo: An Automated Video Action Recognition System." arXiv preprint arXiv:2108.0421 (2021).

@article{zha2021autovideo,
  title={AutoVideo: An Automated Video Action Recognition System},
  author={Zha, Daochen and Bhat, Zaid and Chen, Yi-Wei and Wang, Yicheng and Ding, Sirui and Jain, Anmoll and Bhat, Mohammad and Lai, Kwei-Herng and Chen, Jiaben and Zou, Na and Hu, Xia},
  journal={arXiv preprint arXiv:2108.04212},
  year={2021}
}

Installation

Make sure that you have Python 3.6+ and pip installed. Currently the code is only tested in Linux system. First, install torch and torchvision with

pip3 install torch
pip3 install torchvision

To use the automated searching, you need to install ray-tune and hyperopt with

pip3 install 'ray[tune]' hyperopt

We recommend installing the stable version of autovideo with pip:

pip3 install autovideo

Alternatively, you can clone the latest version with

git clone https://github.com/datamllab/autovideo.git

Then install with

cd autovideo
pip3 install -e .

Quick Start

To try the examples, you may download hmdb6 dataset, which is a subset of hmdb51 with only 6 classes. All the datasets can be downloaded from Google Drive. Then, you may unzip a dataset and put it in datasets. You may also try STGCN for skeleton-based action recogonition on kinetics36, which is a subset of Kinetics dataset with 36 classes.

Fitting and saving a pipeline

python3 examples/fit.py

Some important hyperparameters are as follows.

  • --alg: the supported algorithm. Currently we support tsn, tsm, i3d, eco, eco_full, c3d, r2p1d, r3d, stgcn.
  • --pretrained: whether loading pre-trained weights and fine-tuning.
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for sainge the log
  • --save_path: the path for saving the fitted pipeline

In AutoVideo, all the pipelines can be described as Python Dictionaries. In examplers/fit.py, the default pipline is defined below.

config = {
	"transformation":[
		("RandomCrop", {"size": (128,128)}),
		("Scale", {"size": (128,128)}),
	],
	"augmentation": [
		("meta_ChannelShuffle", {"p": 0.5} ),
		("blur_GaussianBlur",),
		("flip_Fliplr", ),
		("imgcorruptlike_GaussianNoise", ),
	],
	"multi_aug": "meta_Sometimes",
	"algorithm": "tsn",
	"load_pretrained": False,
	"epochs": 50,
}

This pipeline describes what transformation and augmentation primitives will be used, and also how the multiple augmentation primitives are combined. It also specifies using TSN to train 50 epochs from scratch. The hyperparameters can be flexibly configured based on the hyperparameters defined in each primitive.

Loading a fitted pipeline and producing predictions

After fitting a pipeline, you can load a pipeline and make predictions.

python3 examples/produce.py

Some important hyperparameters are as follows.

  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for saving the log
  • --load_path: the path for loading the fitted pipeline

Loading a fitted pipeline and recogonizing actions

After fitting a pipeline, you can also make predicitons on a single video. As a demo, you may download the fitted pipeline and the demo video from Google Drive. Then, you can use the following command to recogonize the action in the video:

python3 examples/recogonize.py

Some important hyperparameters are as follows.

  • --gpu: which gpu device to use. Empty string for CPU.
  • --video_path: the path of video file
  • --log_dir: the path for saving the log
  • --load_path: the path for loading the fitted pipeline

Fitting and producing a pipeline

Alternatively, you can do fit and produce without saving the model with

python3 examples/fit_produce.py

Some important hyperparameters are as follows.

  • --alg: the supported algorithm.
  • --pretrained: whether loading pre-trained weights and fine-tuning.
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset
  • --log_dir: the path for saving the log

Automated searching

In addition to running them by yourself, we also support automated model selection and hyperparameter tuning:

python3 examples/search.py

Some important hyperparameters are as follows.

  • --alg: the searching algorithm. Currently, we support random and hyperopt.
  • --num_samples: the number of samples to be tried
  • --gpu: which gpu device to use. Empty string for CPU.
  • --data_dir: the directory of the dataset

Search sapce can also be specified as Python Dictionaries. An example:

search_space = {
	"augmentation": {
		"aug_0": tune.choice([
			("arithmetic_AdditiveGaussianNoise",),
			("arithmetic_AdditiveLaplaceNoise",),
		]),
		"aug_1": tune.choice([
			("geometric_Rotate",),
			("geometric_Jigsaw",),
		]),
	},
	"multi_aug": tune.choice([
		"meta_Sometimes",
		"meta_Sequential",
	]),
	"algorithm": tune.choice(["tsn"]),
	"learning_rate": tune.uniform(0.0001, 0.001),
	"momentum": tune.uniform(0.9,0.99),
	"weight_decay": tune.uniform(5e-4,1e-3),
	"num_segments": tune.choice([8,16,32]),
}

Supported Action Recogoniton Algorithms

Algorithms Primitive Path Paper
TSN autovideo/recognition/tsn_primitive.py Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
TSM autovideo/recognition/tsm_primitive.py TSM: Temporal Shift Module for Efficient Video Understanding
R2P1D autovideo/recognition/r2p1d_primitive.py A Closer Look at Spatiotemporal Convolutions for Action Recognition
R3D autovideo/recognition/r3d_primitive.py Learning spatio-temporal features with 3d residual networks for action recognition
C3D autovideo/recognition/c3d_primitive.py Learning Spatiotemporal Features with 3D Convolutional Networks
ECO-Lite autovideo/recognition/eco_primitive.py ECO: Efficient Convolutional Network for Online Video Understanding
ECO-Full autovideo/recognition/eco_full_primitive.py ECO: Efficient Convolutional Network for Online Video Understanding
I3D autovideo/recognition/i3d_primitive.py Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
STGCN autovideo/recognition/stgcn_primitive.py Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Supported Augmentation Primitives

We have adapted all the augmentation methods in imgaug to videos and wrap them as primitives. Some examples are as below.

Augmentation Method Primitive Path
AddElementwise autovideo/augmentation/arithmetic/AddElementwise_primitive.py
Cartoon autovideo/augmentation/artistic/Cartoon_primitive.py
BlendAlphaBoundingBoxes autovideo/augmentation/blend/BlendAlphaBoundingBoxes_primitive.py
AverageBlur autovideo/augmentation/blend/AverageBlur_primitive.py
AddToBrightness autovideo/augmentation/color/AddToBrightness_primitive.py
AllChannelsCLAHE autovideo/augmentation/contrast/AllChannelsCLAHE_primitive.py
DirectedEdgeDetect autovideo/augmentation/convolutional/DirectedEdgeDetect_primitive.py
DirectedEdgeDetect autovideo/augmentation/convolutional/DirectedEdgeDetect_primitive.py
SaveDebugImageEveryNBatches autovideo/augmentation/edges/SaveDebugImageEveryNBatches_primitive.py
Canny autovideo/augmentation/debug/Canny_primitive.py
Fliplr autovideo/augmentation/debug/Fliplr_primitive.py
Affine autovideo/augmentation/geometric/Affine_primitive.py
Brightness autovideo/augmentation/imgcorruptlike/Brightness_primitive.py
ChannelShuffle autovideo/augmentation/meta/ChannelShuffle_primitive.py
Autocontrast autovideo/augmentation/pillike/Autocontrast_primitive.py
AveragePooling autovideo/augmentation/pooling/AveragePooling_primitive.py
RegularGridVoronoi autovideo/augmentation/segmentation/RegularGridVoronoi_primitive.py
CenterCropToAspectRatio autovideo/augmentation/size/CenterCropToAspectRatio_primitive.py
Clouds autovideo/augmentation/weather/Clouds_primitive.py

See the Full List of Augmentation Primitives

Advanced Usage

Beyond the above examples, you can also customize the configurations.

Configuring the hypereparamters

Each model in AutoVideo is wrapped as a primitive, which contains some hyperparameters. An example of TSN is here. All the hyperparameters can be specified when building the pipeline by passing a config dictionary. See examples/fit.py.

Configuring the search space

The tuner will search the best hyperparamter combinations within a search sapce to improve the performance. The search space can be defined with ray-tune. See examples/search.py.

Preparing datasets and benchmarking

The datasets must follow d3m format, which consists of a csv file and a media folder. The csv file should have three columns to specify the instance indices, video file names and labels. An example is as below

d3mIndex,video,label
0,Aussie_Brunette_Brushing_Hair_II_brush_hair_u_nm_np1_ri_med_3.avi,0
1,brush_my_hair_without_wearing_the_glasses_brush_hair_u_nm_np1_fr_goo_2.avi,0
2,Brushing_my_waist_lenth_hair_brush_hair_u_nm_np1_ba_goo_0.avi,0
3,brushing_raychel_s_hair_brush_hair_u_cm_np2_ri_goo_2.avi,0
4,Brushing_Her_Hair__[_NEW_AUDIO_]_UPDATED!!!!_brush_hair_h_cm_np1_le_goo_1.avi,0
5,Haarek_mmen_brush_hair_h_cm_np1_fr_goo_0.avi,0
6,Haarek_mmen_brush_hair_h_cm_np1_fr_goo_1.avi,0
7,Prelinger_HabitPat1954_brush_hair_h_nm_np1_fr_med_26.avi,0
8,brushing_hair_2_brush_hair_h_nm_np1_ba_med_2.avi,0

The media folder should contain video files. You may refer to our example hmdb6 dataset in Google Drive. We have also prepared hmdb51 and ucf101 in the Google Drive for benchmarking. Please read benchmark for more details. For some of the algorithms (TSN, TSM, C3D, R2P1D and R3D), if you want to load the pre-trained weights and fine-tune, you need to download the weights from Google Drive and put it to weights.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].