All Projects → Feynman27 → pytorch-detect-to-track

Feynman27 / pytorch-detect-to-track

Licence: MIT license
A pytorch implementation of Detect and Track (https://arxiv.org/abs/1710.03958)

Programming Languages

python
139335 projects - #7 most used programming language
Cuda
1817 projects
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to pytorch-detect-to-track

multi object tracker
An optical flow and Kalman Filter based tracker
Stars: ✭ 31 (-72.81%)
Mutual labels:  object-tracking
OpenCV
Computer Vision programs like Motion Detection, Color Tracking, Motion Rcording, Optical Flow and Object Tracking using Python with OpenCV library
Stars: ✭ 21 (-81.58%)
Mutual labels:  object-tracking
USOT
[ICCV2021] Learning to Track Objects from Unlabeled Videos
Stars: ✭ 52 (-54.39%)
Mutual labels:  object-tracking
Gocv
Go package for computer vision using OpenCV 4 and beyond.
Stars: ✭ 4,511 (+3857.02%)
Mutual labels:  object-tracking
Keras-LSTM-Trajectory-Prediction
A Keras multi-input multi-output LSTM-based RNN for object trajectory forecasting
Stars: ✭ 88 (-22.81%)
Mutual labels:  object-tracking
Prediction-using-Bayesian-Neural-Network
Prediction of continuous signals data and Web tracking data using dynamic Bayesian neural network. Compared with other network architectures aswell.
Stars: ✭ 28 (-75.44%)
Mutual labels:  object-tracking
ACVR2017
An Innovative Salient Object Detection Using Center-Dark Channel Prior
Stars: ✭ 20 (-82.46%)
Mutual labels:  object-tracking
objtrack
实现常用的目标跟踪算法
Stars: ✭ 22 (-80.7%)
Mutual labels:  object-tracking
VBT-Barbell-Tracker
A proof of concept app to optically track a barbell through its range of motion using OpenCV to give the lifter realtime feedback on concentric avg velocity, cutoff velocity, and displacement for a Velocity Based Training program.
Stars: ✭ 53 (-53.51%)
Mutual labels:  object-tracking
chainer-sort
Simple, Online, Realtime Tracking of Multiple Objects (SORT) implementation for Chainer and ChainerCV.
Stars: ✭ 20 (-82.46%)
Mutual labels:  object-tracking
PeekingDuck
A modular framework built to simplify Computer Vision inference workloads.
Stars: ✭ 143 (+25.44%)
Mutual labels:  object-tracking
OpenCV-Object-Tracking
Object Tracking Using OpenCV and Python Plus Comparing different Trackers
Stars: ✭ 32 (-71.93%)
Mutual labels:  object-tracking
SiamFusion
No description or website provided.
Stars: ✭ 26 (-77.19%)
Mutual labels:  object-tracking
Siammask
[CVPR2019] Fast Online Object Tracking and Segmentation: A Unifying Approach
Stars: ✭ 3,205 (+2711.4%)
Mutual labels:  object-tracking
UniTrack
[NeurIPS'21] Unified tracking framework with a single appearance model. It supports Single Object Tracking (SOT), Video Object Segmentation (VOS), Multi-Object Tracking (MOT), Multi-Object Tracking and Segmentation (MOTS), Pose Tracking, Video Instance Segmentation (VIS), and class-agnostic MOT (e.g. TAO dataset).
Stars: ✭ 293 (+157.02%)
Mutual labels:  object-tracking
Yolov5-Deepsort
最新版本yolov5+deepsort目标检测和追踪,能够显示目标类别,支持5.0版本可训练自己数据集
Stars: ✭ 201 (+76.32%)
Mutual labels:  object-tracking
video labeler
A GUI tool for conveniently label the objects in video, using the powerful object tracking.
Stars: ✭ 87 (-23.68%)
Mutual labels:  object-tracking
rpi-urban-mobility-tracker
The easiest way to count pedestrians, cyclists, and vehicles on edge computing devices or live video feeds.
Stars: ✭ 75 (-34.21%)
Mutual labels:  object-tracking
Robust-and-efficient-post-processing-for-video-object-detection
No description or website provided.
Stars: ✭ 107 (-6.14%)
Mutual labels:  video-object-detection
ailia-models
The collection of pre-trained, state-of-the-art AI models for ailia SDK
Stars: ✭ 1,102 (+866.67%)
Mutual labels:  object-tracking

A pytorch implementation of the paper Detect to Track and Track to Detect.

Introduction

This project is a pytorch implementation of detect to track and track to detect. This repository is influenced by the following implementations:

During our implementation, we refer to the above implementations, especially jwyang/faster-rcnn.pytorch. As in that implementation, this repository has the following qualities:

  • It is pure Pytorch code. We convert all the numpy implementations to pytorch!

  • It supports multi-image batch training. We revise all the layers, including dataloader, rpn, roi-pooling, etc., to support multiple images in each minibatch.

  • It supports multiple GPUs training. We use a multiple GPU wrapper (nn.DataParallel here) to make it flexible to use one or more GPUs, as a merit of the above two features.

Furthermore, since the Detect to Track and Track to Detect implementation originally used an R-FCN siamese network and correlation layer, we've added/modified the following:

  • Supports multiple images per roidb entry. By default, we use 2 images in contiguous frames to define an roidb entry to faciliate a forward pass through a two-legged siamese network.

  • It is memory efficient. We limit the aspect ratio of the images in each roidb and group images with similar aspect ratios into a minibatch. As such, we can train resnet101 with batchsize = 2 (4 images) on a 2 Titan X (12 GB).

  • Supports 4 pooling methods. roi pooling, roi alignment, roi cropping, and position-sensitive roi pooling. More importantly, we modify all of them to support multi-image batch training.

  • Supports correlation layer. We adopt the correlation layer from NVIDIA's flownet2 implementation.

Other Resources

Benchmarking

WORK IN PROGRESS

This project is a work in progress, and PRs are welcome. The current implementation is benchmarked against the Imagenet VID dataset.

For training, we adopt the common heuristic of passing alternating samples from VID and DET (e.g. iteration 1 is from VID, iteration 2 is from DET, etc). Additionally, for training, 10 frames are sampled per video snippet. This avoids biasing the training towards longer snippets. However, validation performance is evaluated on each frame from each snippet of VAL. Please refer to the D&T paper for more details.

1). Baseline single-frame RFCN (see this repo: (Trained model can be accessed here under the name rfcn_detect.pth

Imagenet VID+DET (Train/Test: imagenet_vid_train+imagenet_det_train/imagenet_vid_val, scale=600, PS ROI Pooling).

model   #GPUs batch size lr       lr_decay max_epoch     time/epoch mem/GPU mAP
Res-101     2 2 1e-3 5   11   -- 8021MiB   70.3

2). D(&T loss) Imagenet VID+DET (Train/Test: imagenet_vid_train+imagenet_det_train/imagenet_vid_val, scale=600, PS ROI Pooling). This network is initialized with the weights from the single-frame RFCN baseline above. Trained model can be accessed from here under the name rfcn_detect_track_1_7_32941.pth).

Currently, the performance drops by 1.6 percentage points. The issue is currently unknown. Again, PRs are welcome.

model   #GPUs batch size lr       lr_decay max_epoch     time/epoch mem/GPU mAP
Res-101     2 2 1e-4 5   7   -- 8021MiB   68.7

TODO: Result using Viterbi algorithm as linking post-processing step.

  • If not mentioned, the GPU we used is NVIDIA Titan X Pascal (12GB).

prerequisites

  • Python 2.7
  • Pytorch 0.3.0 (0.4.0+ may work, but hasn't been tested; some minor tweaks are probably required.)
  • CUDA 8.0 or higher

TODO:

  • Update to Pytorch 0.4.0+
  • Make Python 3 compatible

Build

As pointed out by ruotianluo/pytorch-faster-rcnn, choose the right -arch to compile the cuda code:

GPU model Architecture
TitanX (Maxwell/Pascal) sm_52
GTX 960M sm_50
GTX 1080 (Ti) sm_61
Grid K520 (AWS g2.2xlarge) sm_30
Tesla K80 (AWS p2.xlarge) sm_37

More details about setting the architecture can be found here or here

Install all the python dependencies using pip:

pip install -r requirements.txt

If you would like to use tensorboard, install the cpu version of Tensorflow and install TensorboardX

Compile the cuda dependencies using following simple commands:

cd lib
sh make.sh

It will compile all the modules you need, including NMS, PSROI_POOLING, ROI_Pooing, ROI_Align and ROI_Crop. The default version is compiled with Python 2.7, please compile by yourself if you are using a different python version.

As pointed out in this issue, if you encounter some error during the compilation, you might miss to export the CUDA paths to your environment.

Training

Then:

cd pytorch-detect-and-track
mkdir data

Download the ILSVRC VID and DET (train/val/test lists can be found here. The ILSVRC2015 images can be downloaded from here ).

Untar the file:

tar xf ILSVRC2015.tar.gz

We'll refer to this directory as $DATAPATH. Make sure the directory structure looks something like:

|--ILSVRC2015
|----Annotations
|------DET
|--------train
|--------val
|------VID
|--------train
|--------val
|----Data
|------DET
|--------train
|--------val
|------VID
|--------train
|--------val
|----ImageSets
|------DET
|------VID

Create a soft link under pytorch-detect-and-track/data:

ln -s $DATAPATH/ILSVRC2015 ./ILSVRC

Create a directory called pytorch-detect-and-track/data/pretrained_model, and place the pretrained models into this directory.

Before training, set the correct directory to save and load the trained models. The default is ./output/models. Change the arguments "save_dir" and "load_dir" in trainval_net.py and test_net.py to adapt to your environment.

To train an RFCN D&T model with resnet-101 on Imagenet VID+DET, simply run:

CUDA_VISIBLE_DEVICES=0,1 python trainval_net.py \
    --cuda \
    --mGPUs \
    --nw 12 \
    --dataset imagenet_vid+imagenet_det \
    --cag \
    --lr 1e-4 \
    --bs 2 \
    --lr_decay_gamma=0.1 \
    --lr_decay_step 3 \
    --epochs 10 \
    --use_tfboard True

where 'bs' is the batch size, --cag is a flag for class-agnostic bbox regression, lr, lr_decay_gamma, and lr_decay_step are the learning rate, factor to decrease the learning rate by, and the number of epochs before decaying the learning rate, respectively. Above, --bs, --nw (number of workers; check with linux nproc), and --mGPUs should be set according to the number of GPUs you wish to train on and your GPU memory size. On 2 Titan Xps with 12G memory, the batch size can be up to 2 (4 images, 2 per GPU).

Authorship

Contributions to this project have been made by Thomas Balestri and Jugal Sheth.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].