All Projects → AlbertoSabater → Robust-and-efficient-post-processing-for-video-object-detection

AlbertoSabater / Robust-and-efficient-post-processing-for-video-object-detection

Licence: GPL-3.0 license
No description or website provided.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Robust-and-efficient-post-processing-for-video-object-detection

Nim
Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
Stars: ✭ 12,270 (+11367.29%)
Mutual labels:  efficient
PyGLM
Fast OpenGL Mathematics (GLM) for Python
Stars: ✭ 167 (+56.07%)
Mutual labels:  efficient
Lokie
iOS efficient AOP Library using C++ and libffi
Stars: ✭ 139 (+29.91%)
Mutual labels:  efficient
Pyeco
python implementation of efficient convolution operators for tracking
Stars: ✭ 150 (+40.19%)
Mutual labels:  efficient
Flutter commonapp
打造一款通用的AppUI结构,包括登录、注册等通用 UI 界面及各工具类和公共部分。
Stars: ✭ 227 (+112.15%)
Mutual labels:  efficient
neutron-language
A simple, extensible and efficient programming language based on C and Python
Stars: ✭ 32 (-70.09%)
Mutual labels:  efficient
Stm32 Dma Uart
Efficient DMA timeout mechanism for peripheral DMA configured in circular mode demonstrated on a STM32 microcontroller.
Stars: ✭ 111 (+3.74%)
Mutual labels:  efficient
DroNet
DroNet: Efficient convolutional neural network detector for Real-Time UAV applications
Stars: ✭ 54 (-49.53%)
Mutual labels:  efficient
Selecsls Pytorch
Reference ImageNet implementation of SelecSLS CNN architecture proposed in the SIGGRAPH 2020 paper "XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera". The repository also includes code for pruning the model based on implicit sparsity emerging from adaptive gradient descent methods, as detailed in the CVPR 2019 paper "On implicit filter level sparsity in Convolutional Neural Networks".
Stars: ✭ 251 (+134.58%)
Mutual labels:  efficient
exificient
Java Implementation of EXI
Stars: ✭ 49 (-54.21%)
Mutual labels:  efficient
Nfancurve
A small and lightweight POSIX script for using a custom fan curve in Linux for those with an Nvidia GPU.
Stars: ✭ 180 (+68.22%)
Mutual labels:  efficient
Amber
A Crystal web framework that makes building applications fast, simple, and enjoyable. Get started with quick prototyping, less bugs, and blazing fast performance.
Stars: ✭ 2,345 (+2091.59%)
Mutual labels:  efficient
jazzle
An Innovative, Fast Transpiler for ECMAScript 2015 and later
Stars: ✭ 65 (-39.25%)
Mutual labels:  efficient
Efficientnet
Implementation of EfficientNet model. Keras and TensorFlow Keras.
Stars: ✭ 1,920 (+1694.39%)
Mutual labels:  efficient
gridpp
Software to post-process gridded weather forecasts
Stars: ✭ 33 (-69.16%)
Mutual labels:  post-processing
Borer
Efficient CBOR and JSON (de)serialization in Scala
Stars: ✭ 131 (+22.43%)
Mutual labels:  efficient
JSON-For-Mirc
JSON parser for mIRC
Stars: ✭ 19 (-82.24%)
Mutual labels:  efficient
Compact-Global-Descriptor
Pytorch implementation of "Compact Global Descriptor for Neural Networks" (CGD).
Stars: ✭ 22 (-79.44%)
Mutual labels:  efficient
gnuboy
latest version of original laguna source, with a handful fixes for modern compilers and systems
Stars: ✭ 70 (-34.58%)
Mutual labels:  efficient
ConvolutionaNeuralNetworksToEnhanceCodedSpeech
In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral d…
Stars: ✭ 25 (-76.64%)
Mutual labels:  post-processing

Robust and efficient post-processing for Video Object Detection (REPP)

PWC

[Paper]

REPP is a learning based post-processing method to improve video object detections from any object detector. REPP links detections accross frames by evaluating their similarity and refines their classification and location to suppress false positives and recover misdetections.

Post-processing pipeline

REPP improves video detections both for specific Image and Video Object Detectors and it supposes a light computation overhead.

Results

Installation

REPP has been tested with Python 3.6.

Its dependencies can be found in repp_requirements.txt file.

pip install -r repp_requirements.txt

Quick usage guide

Video detections must be stored with pickle as tuples (video_name, {frame_dets}) as following:

("video_name", {"000001": [ det_1, det_2, ..., det_N ],
                "000002": [ det_1, det_2, ..., det_M ]},
                ...)

If the stored predictions file contains detections for different videos, they must be saved as a stream of tuples with the above format.

And each detection must have the following format:

det_1: {'image_id': image_id,     # Same as the used in ILSVRC if applies
        'bbox': [ x_min, y_min, width, height ],
        'scores': scores,         # Vector of class confidence scores
        'bbox_center': (x,y) }    # Relative bounding box center

bbox_center coordinates are bounded by 0 and 1 and referes to the center of the detection when the image has been padded vertically or horizontally to fit a square shape.

Check this code for a better insight about the predictions format.

Post-processed detections can be saved both with the COCO or IMDB format.

python REPP.py --repp_cfg ./REPP_cfg/cfg.json --predictions_file predictions_file.pckl --store_coco --store_imdb

As a REPP configuration file, you can use either fgfa_repp_cfg.json or yolo_repp_cfg.json. The first one works better with high performing detectors such as SELSA or FGFA and the second one works better for lower quality detectors. We recommend to set appearance_matching to false in the config file since it requires a non-trivial training of extra models and it's not mandatory for the performance bossting. If needed, the following config parameters can be tunned:

  • min_tubelet_score and min_pred_score: threshold used to suppress low-scoring detections. Higher values speeds up the post-processing execution.
  • clf_thr: threshold to suppress low-scoring detections linking. Lower values will lead to more False Positives and higher ones will lead to fewer detections.
  • recoordinate_std: higher values lead to a more aggressive recoordinating, lower values to a smoother one.

Below you will find instructions to perform any video predictions with YOLOv3 and apply REPP.

Demos

In order to reproduce the results of the paper, you can download the predictions of the different models from the following link and locate them in the project folder as structured in the downloaded zip folder.

Imagenet VID dataset must be downloaded and stored with the following folder structure:

/path/to/dataset/ILSVRC2015/
/path/to/dataset/ILSVRC2015/Annotations/DET
/path/to/dataset/ILSVRC2015/Annotations/VID
/path/to/dataset/ILSVRC2015/Data/DET
/path/to/dataset/ILSVRC2015/Data/VID
/path/to/dataset/ILSVRC2015/ImageSets

Following commands will apply the REPP post-processing and will evaluate the results by calculating the mean Average Precision for different object motions:

# YOLO
python REPP.py --repp_cfg ./REPP_cfg/yolo_repp_cfg.json --predictions_file './demos/YOLOv3/predictions/base_preds.pckl' --evaluate --annotations_filename ./data_annotations/annotations_val_ILSVRC.txt  --path_dataset /path/to/dataset/ILSVRC2015/ --store_coco --store_imdb
> {'mAP_total': 0.7506216640807263, 'mAP_slow': 0.825347229618856, 'mAP_medium': 0.742908326433008, 'mAP_fast': 0.5657881762511975}

# FGFA
python REPP.py --repp_cfg ./REPP_cfg/fgfa_repp_cfg.json --predictions_file './demos/Flow-Guided-Feature-Aggregation/predictions/base_preds.pckl' --evaluate --annotations_filename ./data_annotations/annotations_val_ILSVRC.txt --path_dataset /path/to/dataset/ILSVRC2015/ --store_coco --store_imdb
> {'mAP_total': 0.8009014265948871, 'mAP_slow': 0.8741923949671497, 'mAP_medium': 0.7909183123072739, 'mAP_fast': 0.6137783055850773}

# SELSA
python REPP.py --repp_cfg ./REPP_cfg/selsa_repp_cfg.json --predictions_file './demos/Sequence-Level-Semantics-Aggregation/predictions/old_preds.pckl' --evaluate --annotations_filename ./data_annotations/annotations_val_ILSVRC.txt --path_dataset /path/to/dataset/ILSVRC2015/ --store_coco --store_imdb
> {'mAP_total': 0.8421329795837483, 'mAP_slow': 0.8871784038276325, 'mAP_medium': 0.8332090469178383, 'mAP_fast': 0.7109387713303483}

Instead of download the base predictions, you can also compute them. To do so, you must install the proper dependencies for each model as specified in the original model repositories (YOLOv3, FGFA, SELSA). You must also download their weights and config files from the following link and locate them in the project folder as structured in the downloaded zip file. Then execute the following commands:

# YOLO
cd demos/YOLOv3/
python get_repp_predictions.py --yolo_path ./pretrained_models/ILSVRC/1203_1758_model_8/ --repp_format --add_appearance --from_annotations ../../data_annotations/annotations_val_ILSVRC.txt --dataset_path /path/to/dataset/ILSVRC2015/Data/VID/

# FGFA
cd demos/Flow-Guided-Feature-Aggregation/fgfa_rfcn/
python get_repp_predictions.py  --det_path 'path_to_dataset/ILSVRC2015/'
# SELSA

cd demos/Sequence-Level-Semantics-Aggregation/
python experiments/selsa/get_repp_predictions.py --dataset_path 'path_to_dataset/ILSVRC2015/'

REPP applied to custom videos

REPP can be also applied to the predictions from any video as long as they have the specified REPP format. Following code shows how to compute YOLO predictions from any video and apply REPP post-processing.

# Extract YOLOv3 predictions
cd demos/YOLOv3/
python get_repp_predictions.py --yolo_path ./pretrained_models/ILSVRC/1203_1758_model_8/ --repp_format --add_appearance --from_video ./test_images/video_1.mp4

# Apply REPP
cd ../..
python REPP.py --repp_cfg ./REPP_cfg/yolo_repp_cfg.json --predictions_file './demos/YOLOv3/predictions/preds_repp_app_video_1.pckl' --store_coco

REPP matching model training on ILSVRC

The present project includes trained linking models both to perform the detection matching with and without appearance descriptors. These models have been trained with data from Imagenet VID, but they are able to improve detections for any other dataset or custom video. These Logistic Regression models have been trained using the following steps, that can be adapted to any other custom dataset:

  1. Generate annotations for the Logistic Regression training, based on triplet tuplets (Anchor, Positive, Negative):
python create_triplet_ilsvrc_annotations.py --path_dataset '/path/to/dataset/ILSVRC2015/'
  1. Generate matching features from the annotations:
python clf_dataset_generation.py --path_dataset '/path/to/dataset/ILSVRC2015/' --add_appearance
  1. Train and store the Logistic Regression model:
python train_clf_model.py --add_appearance

Previous steps include appearance features calculated from a pretrained YOLOv3 model. If you are going to use a different dataset or detection model, it's recommended to omit the --add_appearance parameter.

Citation

@inproceedings{sabater2020repp,
  title={Robust and efficient post-processing for Video Object Detection},
  author={Alberto Sabater, Luis Montesano, Ana C. Murillo},
  booktitle={International Conference of Intelligent Robots and Systems (IROS)},
  year={2020}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].