All Projects → wang-xinyu → Tensorrtx

wang-xinyu / Tensorrtx

Licence: mit
Implementation of popular deep learning networks with TensorRT network definition API

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
Cuda
1817 projects
CMake
9771 projects
c
50402 projects - #5 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Tensorrtx

Pytorch Cifar100
Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet, NasNet, Residual Attention Network, SENet, WideResNet)
Stars: ✭ 2,423 (-29.89%)
Mutual labels:  resnet, resnext, squeezenet, inceptionv3, googlenet
Deepstream Project
This is a highly separated deployment project based on Deepstream , including the full range of Yolo and continuously expanding deployment projects such as Ocr.
Stars: ✭ 120 (-96.53%)
Mutual labels:  tensorrt, crnn, arcface, yolov3, yolov5
simpleAICV-pytorch-ImageNet-COCO-training
SimpleAICV:pytorch training example on ImageNet(ILSVRC2012)/COCO2017/VOC2007+2012 datasets.Include ResNet/DarkNet/RetinaNet/FCOS/CenterNet/TTFNet/YOLOv3/YOLOv4/YOLOv5/YOLOX.
Stars: ✭ 276 (-92.01%)
Mutual labels:  resnet, yolov3, yolov4, yolov5
Tensornets
High level network definitions with pre-trained weights in TensorFlow
Stars: ✭ 982 (-71.59%)
Mutual labels:  resnet, yolov3, mobilenetv2, vgg
python cv AI ML
用python做计算机视觉,人工智能,机器学习,深度学习等
Stars: ✭ 73 (-97.89%)
Mutual labels:  vgg, resnet, alexnet, googlenet
DeepNetModel
记录每一个常用的深度模型结构的特点(图和代码)
Stars: ✭ 25 (-99.28%)
Mutual labels:  vgg, resnet, alexnet, googlenet
neural-dream
PyTorch implementation of DeepDream algorithm
Stars: ✭ 110 (-96.82%)
Mutual labels:  vgg, resnet, googlenet
CNN-Series-Getting-Started-and-PyTorch-Implementation
我的笔记和Demo,包含分类,检测、分割、知识蒸馏。
Stars: ✭ 49 (-98.58%)
Mutual labels:  vgg, alexnet, googlenet
Pytorch Image Models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
Stars: ✭ 15,232 (+340.74%)
Mutual labels:  resnet, mnasnet, mobilenetv3
Keras-CIFAR10
practice on CIFAR10 with Keras
Stars: ✭ 25 (-99.28%)
Mutual labels:  vgg, resnet, googlenet
RMNet
RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.
Stars: ✭ 129 (-96.27%)
Mutual labels:  vgg, resnet, mobilenetv2
Yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Stars: ✭ 19,914 (+476.22%)
Mutual labels:  yolov3, yolov4, yolov5
YOLOv5-Lite
🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 930+kb (int8) and 1.7M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~
Stars: ✭ 1,230 (-64.41%)
Mutual labels:  tensorrt, shufflenetv2, yolov5
onnx tensorrt project
Support Yolov5(4.0)/Yolov5(5.0)/YoloR/YoloX/Yolov4/Yolov3/CenterNet/CenterFace/RetinaFace/Classify/Unet. use darknet/libtorch/pytorch/mxnet to onnx to tensorrt
Stars: ✭ 145 (-95.8%)
Mutual labels:  retinaface, yolov4, yolov5
AESRC2020
a deep accent recognition network
Stars: ✭ 35 (-98.99%)
Mutual labels:  resnet, crnn, arcface
Keras Idiomatic Programmer
Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework
Stars: ✭ 720 (-79.17%)
Mutual labels:  resnet, vgg, resnext
Pytorch Yolov4
PyTorch ,ONNX and TensorRT implementation of YOLOv4
Stars: ✭ 3,690 (+6.77%)
Mutual labels:  yolov3, tensorrt, yolov4
Classification models
Classification models trained on ImageNet. Keras.
Stars: ✭ 938 (-72.86%)
Mutual labels:  resnet, vgg, resnext
yolov4 trt ros
YOLOv4 object detector using TensorRT engine
Stars: ✭ 89 (-97.42%)
Mutual labels:  tensorrt, yolov3, yolov4
InsightFace-REST
InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.
Stars: ✭ 308 (-91.09%)
Mutual labels:  tensorrt, arcface, retinaface

TensorRTx

TensorRTx aims to implement popular deep learning networks with tensorrt network definition APIs. As we know, tensorrt has builtin parsers, including caffeparser, uffparser, onnxparser, etc. But when we use these parsers, we often run into some "unsupported operations or layers" problems, especially some state-of-the-art models are using new type of layers.

So why don't we just skip all parsers? We just use TensorRT network definition APIs to build the whole network, it's not so complicated.

I wrote this project to get familiar with tensorrt API, and also to share and learn from the community.

All the models are implemented in pytorch/mxnet/tensorflown first, and export a weights file xxx.wts, and then use tensorrt to load weights, define network and do inference. Some pytorch implementations can be found in my repo Pytorchx, the remaining are from polular open-source implementations.

News

  • 19 Oct 2021. liuqi123123 added cuda preprossing for yolov5, preprocessing + inference is 3x faster when batchsize=8.
  • 18 Oct 2021. xupengao: YOLOv5 updated to v6.0, supporting n/s/m/l/x/n6/s6/m6/l6/x6.
  • 31 Aug 2021. FamousDirector: update retinaface to support TensorRT 8.0.
  • 27 Aug 2021. HaiyangPeng: add a python wrapper for hrnet segmentation.
  • 1 Jul 2021. freedenS: DE⫶TR: End-to-End Object Detection with Transformers. First Transformer model!
  • 10 Jun 2021. upczww: EfficientNet b0-b8 and l2.
  • 23 May 2021. SsisyphusTao: CenterNet DLA-34 with DCNv2 plugin.
  • 17 May 2021. ybw108: arcface LResNet100E-IR and MobileFaceNet.
  • 6 May 2021. makaveli10: scaled-yolov4 yolov4-csp.
  • 29 Apr 2021. upczww: hrnet segmentation w18/w32/w48, ocr branch also.
  • 28 Apr 2021. aditya-dl: mobilenetv2, alexnet, densenet121, mobilenetv3 with python API.
  • 26 Apr 2021. makaveli10 add Inceptionv4.
  • 25 Apr 2021. YOLOv5 updated to v5.0, supporting s/m/l/x/s6/m6/l6/x6.
  • 23 Apr 2021. irvingzhang0512 add TSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019.
  • 23 Apr 2021. freedenS implement MaskRCNN, till now the MOST complicated model in this repo.

Tutorials

Test Environment

  1. GTX1080 / Ubuntu16.04 / cuda10.0 / cudnn7.6.5 / tensorrt7.0.0 / nvinfer7.0.0 / opencv3.3

How to run

Each folder has a readme inside, which explains how to run the models inside.

Models

Following models are implemented.

Name Description
lenet the simplest, as a "hello world" of this project
alexnet easy to implement, all layers are supported in tensorrt
googlenet GoogLeNet (Inception v1)
inception Inception v3, v4
mnasnet MNASNet with depth multiplier of 0.5 from the paper
mobilenet MobileNet v2, v3-small, v3-large
resnet resnet-18, resnet-50 and resnext50-32x4d are implemented
senet se-resnet50
shufflenet ShuffleNet v2 with 0.5x output channels
squeezenet SqueezeNet 1.1 model
vgg VGG 11-layer model
yolov3-tiny weights and pytorch implementation from ultralytics/yolov3
yolov3 darknet-53, weights and pytorch implementation from ultralytics/yolov3
yolov3-spp darknet-53, weights and pytorch implementation from ultralytics/yolov3
yolov4 CSPDarknet53, weights from AlexeyAB/darknet, pytorch implementation from ultralytics/yolov3
yolov5 yolov5 v1.0-v6.0, pytorch implementation from ultralytics/yolov5
retinaface resnet50 and mobilnet0.25, weights from biubug6/Pytorch_Retinaface
arcface LResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights from deepinsight/insightface
retinafaceAntiCov mobilenet0.25, weights from deepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute
dbnet Scene Text Detection, weights from BaofengZan/DBNet.pytorch
crnn pytorch implementation from meijieru/crnn.pytorch
ufld pytorch implementation from Ultra-Fast-Lane-Detection, ECCV2020
hrnet hrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation from HRNet-Image-Classification and HRNet-Semantic-Segmentation
psenet PSENet Text Detection, tensorflow implementation from liuheng92/tensorflow_PSENet
ibnnet IBN-Net, pytorch implementation from XingangPan/IBN-Net, ECCV2018
unet U-Net, pytorch implementation from milesial/Pytorch-UNet
repvgg RepVGG, pytorch implementation from DingXiaoH/RepVGG
lprnet LPRNet, pytorch implementation from xuexingyu24/License_Plate_Detection_Pytorch
refinedet RefineDet, pytorch implementation from luuuyi/RefineDet.PyTorch
densenet DenseNet-121, from torchvision.models
rcnn FasterRCNN and MaskRCNN, model from detectron2
tsm TSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019
scaled-yolov4 yolov4-csp, pytorch from WongKinYiu/ScaledYOLOv4
centernet CenterNet DLA-34, pytorch from xingyizhou/CenterNet
efficientnet EfficientNet b0-b8 and l2, pytorch from lukemelas/EfficientNet-PyTorch
detr DE⫶TR, pytorch from facebookresearch/detr

Model Zoo

The .wts files can be downloaded from model zoo for quick evaluation. But it is recommended to convert .wts from pytorch/mxnet/tensorflow model, so that you can retrain your own model.

GoogleDrive | BaiduPan pwd: uvv2

Tricky Operations

Some tricky operations encountered in these models, already solved, but might have better solutions.

Name Description
BatchNorm Implement by a scale layer, used in resnet, googlenet, mobilenet, etc.
MaxPool2d(ceil_mode=True) use a padding layer before maxpool to solve ceil_mode=True, see googlenet.
average pool with padding use setAverageCountExcludesPadding() when necessary, see inception.
relu6 use Relu6(x) = Relu(x) - Relu(x-6), see mobilenet.
torch.chunk() implement the 'chunk(2, dim=C)' by tensorrt plugin, see shufflenet.
channel shuffle use two shuffle layers to implement channel_shuffle, see shufflenet.
adaptive pool use fixed input dimension, and use regular average pooling, see shufflenet.
leaky relu I wrote a leaky relu plugin, but PRelu in NvInferPlugin.h can be used, see yolov3 in branch trt4.
yolo layer v1 yolo layer is implemented as a plugin, see yolov3 in branch trt4.
yolo layer v2 three yolo layers implemented in one plugin, see yolov3-spp.
upsample replaced by a deconvolution layer, see yolov3.
hsigmoid hard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3
retinaface output decode implement a plugin to decode bbox, confidence and landmarks, see retinaface.
mish mish activation is implemented as a plugin, mish is used in yolov4
prelu mxnet's prelu activation with trainable gamma is implemented as a plugin, used in arcface
HardSwish hard_swish = x * hard_sigmoid, used in yolov5 v3.0
LSTM Implemented pytorch nn.LSTM() with tensorrt api

Speed Benchmark

Models Device BatchSize Mode Input Shape(HxW) FPS
YOLOv3-tiny Xeon E5-2620/GTX1080 1 FP32 608x608 333
YOLOv3(darknet53) Xeon E5-2620/GTX1080 1 FP32 608x608 39.2
YOLOv3(darknet53) Xeon E5-2620/GTX1080 1 INT8 608x608 71.4
YOLOv3-spp(darknet53) Xeon E5-2620/GTX1080 1 FP32 608x608 38.5
YOLOv4(CSPDarknet53) Xeon E5-2620/GTX1080 1 FP32 608x608 35.7
YOLOv4(CSPDarknet53) Xeon E5-2620/GTX1080 4 FP32 608x608 40.9
YOLOv4(CSPDarknet53) Xeon E5-2620/GTX1080 8 FP32 608x608 41.3
YOLOv5-s v3.0 Xeon E5-2620/GTX1080 1 FP32 608x608 142
YOLOv5-s v3.0 Xeon E5-2620/GTX1080 4 FP32 608x608 173
YOLOv5-s v3.0 Xeon E5-2620/GTX1080 8 FP32 608x608 190
YOLOv5-m v3.0 Xeon E5-2620/GTX1080 1 FP32 608x608 71
YOLOv5-l v3.0 Xeon E5-2620/GTX1080 1 FP32 608x608 43
YOLOv5-x v3.0 Xeon E5-2620/GTX1080 1 FP32 608x608 29
YOLOv5-s v4.0 Xeon E5-2620/GTX1080 1 FP32 608x608 142
YOLOv5-m v4.0 Xeon E5-2620/GTX1080 1 FP32 608x608 71
YOLOv5-l v4.0 Xeon E5-2620/GTX1080 1 FP32 608x608 40
YOLOv5-x v4.0 Xeon E5-2620/GTX1080 1 FP32 608x608 27
RetinaFace(resnet50) Xeon E5-2620/GTX1080 1 FP32 480x640 90
RetinaFace(resnet50) Xeon E5-2620/GTX1080 1 INT8 480x640 204
RetinaFace(mobilenet0.25) Xeon E5-2620/GTX1080 1 FP32 480x640 417
ArcFace(LResNet50E-IR) Xeon E5-2620/GTX1080 1 FP32 112x112 333
CRNN Xeon E5-2620/GTX1080 1 FP32 32x100 1000

Help wanted, if you got speed results, please add an issue or PR.

Acknowledgments & Contact

Any contributions, questions and discussions are welcomed, contact me by following info.

E-mail: [email protected]

WeChat ID: wangxinyu0375 (可加我微信进tensorrtx交流群,备注:tensorrtx)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].