All Projects โ†’ ppogg โ†’ YOLOv5-Lite

ppogg / YOLOv5-Lite

Licence: GPL-3.0 license
๐Ÿ…๐Ÿ…๐Ÿ…YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 930+kb (int8) and 1.7M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320ร—320~

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Makefile
30231 projects
c
50402 projects - #5 most used programming language
CMake
9771 projects
shell
77523 projects

Projects that are alternatives of or similar to YOLOv5-Lite

Tnn
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobileใ€desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and โ€ฆ
Stars: โœญ 3,257 (+164.8%)
Mutual labels:  tensorrt, mnn, ncnn
lite.ai.toolkit
๐Ÿ›  A lite C++ toolkit of awesome AI models with ONNXRuntime, NCNN, MNN and TNN. YOLOX, YOLOP, MODNet, YOLOR, NanoDet, YOLOX, SCRFD, YOLOX . MNN, NCNN, TNN, ONNXRuntime, CPU/GPU.
Stars: โœญ 1,354 (+10.08%)
Mutual labels:  mnn, ncnn, onnxruntime
Nanodet
โšกSuper fast and lightweight anchor-free object detection model. ๐Ÿ”ฅOnly 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone๐Ÿ”ฅ
Stars: โœญ 3,640 (+195.93%)
Mutual labels:  mnn, ncnn, repvgg
Tensorrtx
Implementation of popular deep learning networks with TensorRT network definition API
Stars: โœญ 3,456 (+180.98%)
Mutual labels:  tensorrt, shufflenetv2, yolov5
InferenceHelper
C++ Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, OpenVINO, ncnn, MNN, SNPE, Arm NN, NNabla, ONNX Runtime, LibTorch, TensorFlow
Stars: โœญ 142 (-88.46%)
Mutual labels:  tensorrt, mnn, ncnn
Tensorflow Yolov4 Tflite
YOLOv4, YOLOv4-tiny, YOLOv3, YOLOv3-tiny Implemented in Tensorflow 2.0, Android. Convert YOLO v4 .weights tensorflow, tensorrt and tflite
Stars: โœญ 1,881 (+52.93%)
Mutual labels:  tensorrt, tflite
Jetson-Nano-image
Jetson Nano image with deep learning frameworks
Stars: โœญ 46 (-96.26%)
Mutual labels:  mnn, ncnn
Deepdetect
Deep Learning API and Server in C++14 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
Stars: โœญ 2,306 (+87.48%)
Mutual labels:  tensorrt, ncnn
Sightseq
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
Stars: โœญ 116 (-90.57%)
Mutual labels:  transformer, mobilenet
Deepstream Project
This is a highly separated deployment project based on Deepstream , including the full range of Yolo and continuously expanding deployment projects such as Ocr.
Stars: โœญ 120 (-90.24%)
Mutual labels:  tensorrt, yolov5
fastT5
โšก boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: โœญ 421 (-65.77%)
Mutual labels:  transformer, onnxruntime
flexible-yolov5
More readable and flexible yolov5 with more backbone(resnet, shufflenet, moblienet, efficientnet, hrnet, swin-transformer) and (cbam๏ผŒdcn and so on), and tensorrt
Stars: โœญ 282 (-77.07%)
Mutual labels:  tensorrt, yolov5
RepVGG TensorRT int8
RepVGG TensorRT int8 ้‡ๅŒ–๏ผŒๅฎžๆต‹ๆŽจ็†ไธๅˆฐ1msไธ€ๅธง๏ผ
Stars: โœญ 35 (-97.15%)
Mutual labels:  tensorrt, repvgg
yolov5 deepsort tensorrt cpp
This repo is a C++ version of yolov5_deepsort_tensorrt. Packing all C++ programs into .so files, using Python script to call C++ programs further.
Stars: โœญ 21 (-98.29%)
Mutual labels:  tensorrt, yolov5
ros-yolo-sort
YOLO v3, v4, v5, v6, v7 + SORT tracking + ROS platform. Supporting: YOLO with Darknet, OpenCV(DNN), OpenVINO, TensorRT(tkDNN). SORT supports python(original) and C++. (Not Deep SORT)
Stars: โœญ 162 (-86.83%)
Mutual labels:  tensorrt, yolov5
tensorRT Pro
C++ library based on tensorrt integration
Stars: โœญ 857 (-30.33%)
Mutual labels:  tensorrt, yolov5
deepvac
PyTorch Project Specification.
Stars: โœญ 507 (-58.78%)
Mutual labels:  tensorrt, ncnn
yolov5 tensorrt int8
TensorRT int8 ้‡ๅŒ–้ƒจ็ฝฒ yolov5s ๆจกๅž‹๏ผŒๅฎžๆต‹3.3msไธ€ๅธง๏ผ
Stars: โœญ 112 (-90.89%)
Mutual labels:  tensorrt, yolov5
glDelegateBenchmark
quick and dirty benchmark for TFLite gles delegate on iOS
Stars: โœญ 13 (-98.94%)
Mutual labels:  mobilenet, tflite
ONNX-Runtime-with-TensorRT-and-OpenVINO
Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment
Stars: โœญ 15 (-98.78%)
Mutual labels:  tensorrt, onnxruntime

YOLOv5-Lite๏ผšLighter, faster and easier to deploy

่ฎบๆ–‡ๆ’ๅ›พ

Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, and fewer parameters) and faster (add shuffle channel, yolov5 head for channel reduce. It can infer at least 10+ FPS On the Raspberry Pi 4B when input the frame with 320ร—320) and is easier to deploy (removing the Focus layer and four slice operations, reducing the model quantization accuracy to an acceptable range).

image

Comparison of ablation experiment results

ID Model Input_size Flops Params Size๏ผˆM๏ผ‰ [email protected] [email protected]:0.95
001 yolo-fastest 320ร—320 0.25G 0.35M 1.4 24.4 -
002 YOLOv5-Liteeours 320ร—320 0.73G 0.78M 1.7 35.1 -
003 NanoDet-m 320ร—320 0.72G 0.95M 1.8 - 20.6
004 yolo-fastest-xl 320ร—320 0.72G 0.92M 3.5 34.3 -
005 YOLOXNano 416ร—416 1.08G 0.91M 7.3(fp32) - 25.8
006 yolov3-tiny 416ร—416 6.96G 6.06M 23.0 33.1 16.6
007 yolov4-tiny 416ร—416 5.62G 8.86M 33.7 40.2 21.7
008 YOLOv5-Litesours 416ร—416 1.66G 1.64M 3.4 42.0 25.2
009 YOLOv5-Litecours 512ร—512 5.92G 4.57M 9.2 50.9 32.5
010 NanoDet-EfficientLite2 512ร—512 7.12G 4.71M 18.3 - 32.6
011 YOLOv5s(6.0) 640ร—640 16.5G 7.23M 14.0 56.0 37.2
012 YOLOv5-Litegours 640ร—640 15.6G 5.39M 10.9 57.6 39.1

See the wiki: https://github.com/ppogg/YOLOv5-Lite/wiki/Test-the-map-of-models-about-coco

Comparison on different platforms

Equipment Computing backend System Input Framework v5lite-e v5lite-s v5lite-c v5lite-g YOLOv5s
Inter @i5-10210U window(x86) 640ร—640 openvino - - 46ms - 131ms
Nvidia @RTX 2080Ti Linux(x86) 640ร—640 torch - - - 15ms 14ms
Redmi K30 @Snapdragon 730G Android(armv8) 320ร—320 ncnn 27ms 38ms - - 163ms
Xiaomi 10 @Snapdragon 865 Android(armv8) 320ร—320 ncnn 10ms 14ms - - 163ms
Raspberrypi 4B @ARM Cortex-A72 Linux(arm64) 320ร—320 ncnn - 84ms - - 371ms
Raspberrypi 4B @ARM Cortex-A72 Linux(arm64) 320ร—320 mnn - 76ms - - 356ms
  • The above is a 4-thread test benchmark
  • Raspberrypi 4B enable bf16s optimization๏ผŒRaspberrypi 64 Bit OS

qqไบคๆต็พค๏ผš993965802

ๅ…ฅ็พค็ญ”ๆกˆ:ๅ‰ชๆž or ่’ธ้ฆ or ้‡ๅŒ– or ไฝŽ็งฉๅˆ†่งฃ๏ผˆไปปๆ„ๅ…ถไธ€ๅ‡ๅฏ๏ผ‰

ยทModel Zooยท

@v5lite-e:

Model Size Backbone Head Framework Design for
v5Lite-e.pt 1.7m shufflenetv2๏ผˆMegvii๏ผ‰ v5Litee-head Pytorch Arm-cpu
v5Lite-e.bin
v5Lite-e.param
1.7m shufflenetv2 v5Litee-head ncnn Arm-cpu
v5Lite-e-int8.bin
v5Lite-e-int8.param
0.9m shufflenetv2 v5Litee-head ncnn Arm-cpu
v5Lite-e-fp32.mnn 3.0m shufflenetv2 v5Litee-head mnn Arm-cpu
v5Lite-e-fp32.tnnmodel
v5Lite-e-fp32.tnnproto
2.9m shufflenetv2 v5Litee-head tnn arm-cpu
v5Lite-e-320.onnx 3.1m shufflenetv2 v5Litee-head onnxruntime x86-cpu

@v5lite-s:

Model Size Backbone Head Framework Design for
v5Lite-s.pt 3.4m shufflenetv2๏ผˆMegvii๏ผ‰ v5Lites-head Pytorch Arm-cpu
v5Lite-s.bin
v5Lite-s.param
3.3m shufflenetv2 v5Lites-head ncnn Arm-cpu
v5Lite-s-int8.bin
v5Lite-s-int8.param
1.7m shufflenetv2 v5Lites-head ncnn Arm-cpu
v5Lite-s.mnn 3.3m shufflenetv2 v5Lites-head mnn Arm-cpu
v5Lite-s-int4.mnn 987k shufflenetv2 v5Lites-head mnn Arm-cpu
v5Lite-s-fp16.bin
v5Lite-s-fp16.xml
3.4m shufflenetv2 v5Lites-head openvivo x86-cpu
v5Lite-s-fp32.bin
v5Lite-s-fp32.xml
6.8m shufflenetv2 v5Lites-head openvivo x86-cpu
v5Lite-s-fp16.tflite 3.3m shufflenetv2 v5Lites-head tflite arm-cpu
v5Lite-s-fp32.tflite 6.7m shufflenetv2 v5Lites-head tflite arm-cpu
v5Lite-s-int8.tflite 1.8m shufflenetv2 v5Lites-head tflite arm-cpu
v5Lite-s-416.onnx 6.4m shufflenetv2 v5Lites-head onnxruntime x86-cpu

@v5lite-c:

Model Size Backbone Head Framework Design for
v5Lite-c.pt 9m PPLcnet๏ผˆBaidu๏ผ‰ v5s-head Pytorch x86-cpu / x86-vpu
v5Lite-c.bin
v5Lite-c.xml
8.7m PPLcnet v5s-head openvivo x86-cpu / x86-vpu
v5Lite-c-512.onnx 18m PPLcnet v5s-head onnxruntime x86-cpu

@v5lite-g:

Model Size Backbone Head Framework Design for
v5Lite-g.pt 10.9m Repvgg๏ผˆTsinghua๏ผ‰ v5Liteg-head Pytorch x86-gpu / arm-gpu / arm-npu
v5Lite-g-int8.engine 8.5m Repvgg-yolov5 v5Liteg-head Tensorrt x86-gpu / arm-gpu / arm-npu
v5lite-g-int8.tmfile 8.7m Repvgg-yolov5 v5Liteg-head Tengine arm-npu
v5Lite-g-640.onnx 21m Repvgg-yolov5 yolov5-head onnxruntime x86-cpu

Download Link๏ผš

|โ”€โ”€โ”€โ”€โ”€โ”€ncnn-fp16: | Baidu Drive | Google Drive |
|โ”€โ”€โ”€โ”€โ”€โ”€ncnn-int8: | Baidu Drive | Google Drive |
โ””โ”€โ”€โ”€โ”€โ”€โ”€onnx-fp32: | Baidu Drive | Google Drive |

|โ”€โ”€โ”€โ”€โ”€โ”€ncnn-fp16: | Baidu Drive | Google Drive |
|โ”€โ”€โ”€โ”€โ”€โ”€ncnn-int8: | Baidu Drive | Google Drive |
|โ”€โ”€โ”€โ”€โ”€โ”€mnn-fp16: | Baidu Drive | Google Drive |
|โ”€โ”€โ”€โ”€โ”€โ”€mnn-int4: | Baidu Drive | Google Drive |
|โ”€โ”€โ”€โ”€โ”€โ”€onnx-fp32: | Baidu Drive | Google Drive |
โ””โ”€โ”€โ”€โ”€โ”€โ”€tengine-fp32: | Baidu Drive | Google Drive |

|โ”€โ”€โ”€โ”€โ”€โ”€onnx-fp32: | Baidu Drive | Google Drive |
โ””โ”€โ”€โ”€โ”€โ”€โ”€openvino-fp16: | Baidu Drive | Google Drive |

โ””โ”€โ”€โ”€โ”€โ”€โ”€onnx-fp32: | Baidu Drive | Google Drive |

Baidu Drive Password: pogg

v5lite-s model: TFLite Float32, Float16, INT8, Dynamic range quantization, ONNX, TFJS, TensorRT, OpenVINO IR FP32/FP16, Myriad Inference Engin Blob, CoreML

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite

Thanks for PINTO0309:https://github.com/PINTO0309/PINTO_model_zoo/tree/main/180_YOLOv5-Lite

How to use

Install

Python>=3.6.0 is required with all requirements.txt installed including PyTorch>=1.7:

$ git clone https://github.com/ppogg/YOLOv5-Lite
$ cd YOLOv5-Lite
$ pip install -r requirements.txt
Inference with detect.py

detect.py runs inference on a variety of sources, downloading models automatically from the latest YOLOv5-Lite release and saving results to runs/detect.

$ python detect.py --source 0  # webcam
                            file.jpg  # image 
                            file.mp4  # video
                            path/  # directory
                            path/*.jpg  # glob
                            'https://youtu.be/NUsoVlDFqZg'  # YouTube
                            'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream
Training
$ python train.py --data coco.yaml --cfg v5lite-e.yaml --weights v5lite-e.pt --batch-size 128
                                         v5lite-s.yaml           v5lite-s.pt              128
                                         v5lite-c.yaml           v5lite-c.pt               96
                                         v5lite-g.yaml           v5lite-g.pt               64

If you use multi-gpu. It's faster several times:

$ python -m torch.distributed.launch --nproc_per_node 2 train.py
DataSet

Training set and test set distribution ๏ผˆthe path with xx.jpg๏ผ‰

train: ../coco/images/train2017/
val: ../coco/images/val2017/
โ”œโ”€โ”€ images            # xx.jpg example
โ”‚   โ”œโ”€โ”€ train2017        
โ”‚   โ”‚   โ”œโ”€โ”€ 000001.jpg
โ”‚   โ”‚   โ”œโ”€โ”€ 000002.jpg
โ”‚   โ”‚   โ””โ”€โ”€ 000003.jpg
โ”‚   โ””โ”€โ”€ val2017         
โ”‚       โ”œโ”€โ”€ 100001.jpg
โ”‚       โ”œโ”€โ”€ 100002.jpg
โ”‚       โ””โ”€โ”€ 100003.jpg
โ””โ”€โ”€ labels             # xx.txt example      
    โ”œโ”€โ”€ train2017       
    โ”‚   โ”œโ”€โ”€ 000001.txt
    โ”‚   โ”œโ”€โ”€ 000002.txt
    โ”‚   โ””โ”€โ”€ 000003.txt
    โ””โ”€โ”€ val2017         
        โ”œโ”€โ”€ 100001.txt
        โ”œโ”€โ”€ 100002.txt
        โ””โ”€โ”€ 100003.txt
Auto LabelImg

Link ๏ผšhttps://github.com/ppogg/AutoLabelImg

You can use LabelImg based YOLOv5-5.0 and YOLOv5-Lite to AutoAnnotate, biubiubiu ๐Ÿš€ ๐Ÿš€ ๐Ÿš€

Model Hub

Here, the original components of YOLOv5 and the reproduced components of YOLOv5-Lite are organized and stored in the model hub๏ผš

modelhub

Heatmap Analysis
$ python main.py --type all

่ฎบๆ–‡ๆ’ๅ›พ2

Updating ...

How to deploy

ncnn for arm-cpu

mnn for arm-cpu

openvino x86-cpu or x86-vpu

tensorrt(C++) for arm-gpu or arm-npu or x86-gpu

tensorrt(Python) for arm-gpu or arm-npu or x86-gpu

Android for arm-cpu

Android_demo

This is a Redmi phone, the processor is Snapdragon 730G, and yolov5-lite is used for detection. The performance is as follows:

link: https://github.com/ppogg/YOLOv5-Lite/tree/master/android_demo/ncnn-android-v5lite

Android_v5Lite-s: https://drive.google.com/file/d/1CtohY68N2B9XYuqFLiTp-Nd2kuFWgAUR/view?usp=sharing

Android_v5Lite-g: https://drive.google.com/file/d/1FnvkWxxP_aZwhi000xjIuhJ_OhqOUJcj/view?usp=sharing

new android app:[link] https://pan.baidu.com/s/1PRhW4fI1jq8VboPyishcIQ [keyword] pogg


More detailed explanation

Detailed model link:

What is YOLOv5-Lite S/E model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/400545131

What is YOLOv5-Lite C model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/420737659

What is YOLOv5-Lite G model: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/410874403

How to deploy on ncnn with fp16 or int8: csdn link (Chinese): https://blog.csdn.net/weixin_45829462/article/details/119787840

How to deploy on onnxruntime: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/476533259

How to deploy on tensorrt: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/478630138

How to optimize on tensorrt: zhihu link (Chinese): https://zhuanlan.zhihu.com/p/463074494

Reference

https://github.com/ultralytics/yolov5

https://github.com/megvii-model/ShuffleNet-Series

https://github.com/Tencent/ncnn

Citing YOLOv5-Lite

If you use YOLOv5-Lite in your research, please cite our work and give a star โญ:

 @misc{yolov5lite2021,
  title = {YOLOv5-Lite: Lighter, faster and easier to deploy},
  author = {Xiangrong Chen and Ziman Gong},
  doi = {10.5281/zenodo.5241425}
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].