All Projects → NVIDIA-AI-IOT → Tf_to_trt_image_classification

NVIDIA-AI-IOT / Tf_to_trt_image_classification

Licence: other
Image classification with NVIDIA TensorRT from TensorFlow models.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tf to trt image classification

Oltpbench
Database Benchmarking Framework
Stars: ✭ 317 (-25.76%)
Mutual labels:  benchmark
Face Landmarks Detection Benchmark
Face landmarks(fiducial points) detection benchmark
Stars: ✭ 348 (-18.5%)
Mutual labels:  benchmark
Ffi Overhead
comparing the c ffi (foreign function interface) overhead on various programming languages
Stars: ✭ 387 (-9.37%)
Mutual labels:  benchmark
Cob
Continuous Benchmark for Go Project
Stars: ✭ 326 (-23.65%)
Mutual labels:  benchmark
Pcam
The PatchCamelyon (PCam) deep learning classification benchmark.
Stars: ✭ 340 (-20.37%)
Mutual labels:  benchmark
Tensorlayer Tricks
How to use TensorLayer
Stars: ✭ 357 (-16.39%)
Mutual labels:  tensorflow-models
Layoutframeworkbenchmark
Benchmark the performances of various Swift layout frameworks (autolayout, UIStackView, PinLayout, LayoutKit, FlexLayout, Yoga, ...)
Stars: ✭ 316 (-26%)
Mutual labels:  benchmark
Blurtestandroid
This is a simple App to test some blur algorithms on their visual quality and performance.
Stars: ✭ 396 (-7.26%)
Mutual labels:  benchmark
Medmnist
[ISBI'21] MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis
Stars: ✭ 338 (-20.84%)
Mutual labels:  benchmark
Yet Another Bench Script
YABS - a simple bash script to estimate Linux server performance using fio, iperf3, & Geekbench
Stars: ✭ 348 (-18.5%)
Mutual labels:  benchmark
Across
Across the Great Wall we can reach every corner in the world
Stars: ✭ 3,654 (+755.74%)
Mutual labels:  benchmark
Deeperforensics 1.0
[CVPR 2020] A Large-Scale Dataset for Real-World Face Forgery Detection
Stars: ✭ 338 (-20.84%)
Mutual labels:  benchmark
Cppflow
Run TensorFlow models in C++ without installation and without Bazel
Stars: ✭ 357 (-16.39%)
Mutual labels:  tensorflow-models
Human Learn
Natural Intelligence is still a pretty good idea.
Stars: ✭ 323 (-24.36%)
Mutual labels:  benchmark
Benchmarks Of Javascript Package Managers
Benchmarks of JavaScript Package Managers
Stars: ✭ 388 (-9.13%)
Mutual labels:  benchmark
Fperf
Framework of performance testing
Stars: ✭ 316 (-26%)
Mutual labels:  benchmark
Benchmark results
Visual Tracking Paper List
Stars: ✭ 3,672 (+759.95%)
Mutual labels:  benchmark
Pyperformance
Python Performance Benchmark Suite
Stars: ✭ 406 (-4.92%)
Mutual labels:  benchmark
Sysbench
Scriptable database and system performance benchmark
Stars: ✭ 4,268 (+899.53%)
Mutual labels:  benchmark
Css In Js Benchmarks
Stars: ✭ 360 (-15.69%)
Mutual labels:  benchmark

TensorFlow->TensorRT Image Classification

landing graphic

This contains examples, scripts and code related to image classification using TensorFlow models (from here) converted to TensorRT. Converting TensorFlow models to TensorRT offers significant performance gains on the Jetson TX2 as seen below.

Models

The table below shows various details related to pretrained models ported from the TensorFlow slim model zoo.

Model Input Size TensorRT (TX2 / Half) TensorRT (TX2 / Float) TensorFlow (TX2 / Float) Input Name Output Name Preprocessing Fn.
inception_v1 224x224 7.98ms 12.8ms 27.6ms input InceptionV1/Logits/SpatialSqueeze inception
inception_v3 299x299 26.3ms 46.1ms 98.4ms input InceptionV3/Logits/SpatialSqueeze inception
inception_v4 299x299 52.1ms 88.2ms 176ms input InceptionV4/Logits/Logits/BiasAdd inception
inception_resnet_v2 299x299 53.0ms 98.7ms 168ms input InceptionResnetV2/Logits/Logits/BiasAdd inception
resnet_v1_50 224x224 15.7ms 27.1ms 63.9ms input resnet_v1_50/SpatialSqueeze vgg
resnet_v1_101 224x224 29.9ms 51.8ms 107ms input resnet_v1_101/SpatialSqueeze vgg
resnet_v1_152 224x224 42.6ms 78.2ms 157ms input resnet_v1_152/SpatialSqueeze vgg
resnet_v2_50 299x299 27.5ms 44.4ms 92.2ms input resnet_v2_50/SpatialSqueeze inception
resnet_v2_101 299x299 49.2ms 83.1ms 160ms input resnet_v2_101/SpatialSqueeze inception
resnet_v2_152 299x299 74.6ms 124ms 230ms input resnet_v2_152/SpatialSqueeze inception
mobilenet_v1_0p25_128 128x128 2.67ms 2.65ms 15.7ms input MobilenetV1/Logits/SpatialSqueeze inception
mobilenet_v1_0p5_160 160x160 3.95ms 4.00ms 16.9ms input MobilenetV1/Logits/SpatialSqueeze inception
mobilenet_v1_1p0_224 224x224 12.9ms 12.9ms 24.4ms input MobilenetV1/Logits/SpatialSqueeze inception
vgg_16 224x224 38.2ms 79.2ms 171ms input vgg_16/fc8/BiasAdd vgg

The times recorded include data transfer to GPU, network execution, and data transfer back from GPU. Time does not include preprocessing. See scripts/test_tf.py, scripts/test_trt.py, and src/test/test_trt.cu for implementation details.

Setup

  1. Flash the Jetson TX2 using JetPack 3.2. Be sure to install

    • CUDA 9.0
    • OpenCV4Tegra
    • cuDNN
    • TensorRT 3.0
  2. Install pip on Jetson TX2.

    sudo apt-get install python-pip
    
  3. Install TensorFlow on Jetson TX2.

    1. Download the TensorFlow 1.5.0 pip wheel from here. This build of TensorFlow is provided as a convenience for the purposes of this project.

    2. Install TensorFlow using pip

        sudo pip install tensorflow-1.5.0rc0-cp27-cp27mu-linux_aarch64.whl
      
  4. Install uff exporter on Jetson TX2.

    1. Download TensorRT 3.0.4 for Ubuntu 16.04 and CUDA 9.0 tar package from https://developer.nvidia.com/nvidia-tensorrt-download.

    2. Extract archive

        tar -xzf TensorRT-3.0.4.Ubuntu-16.04.3.x86_64.cuda-9.0.cudnn7.0.tar.gz
      
    3. Install uff python package using pip

        sudo pip install TensorRT-3.0.4/uff/uff-0.2.0-py2.py3-none-any.whl
      
  5. Clone and build this project

    git clone --recursive https://github.com/NVIDIA-Jetson/tf_to_trt_image_classification.git
    cd tf_to_trt_image_classification
    mkdir build
    cd build
    cmake ..
    make 
    cd ..
    

Download models and create frozen graphs

Run the following bash script to download all of the pretrained models.

source scripts/download_models.sh

If there are any models you don't want to use, simply remove the URL from the model list in scripts/download_models.sh.
Next, because the TensorFlow models are provided in checkpoint format, we must convert them to frozen graphs for optimization with TensorRT. Run the scripts/models_to_frozen_graphs.py script.

python scripts/models_to_frozen_graphs.py

If you removed any models in the previous step, you must add 'exclude': true to the corresponding item in the NETS dictionary located in scripts/model_meta.py. If you are following the instructions for executing engines below, you may also need some sample images. Run the following script to download a few images from ImageNet.

source scripts/download_images.sh

Convert frozen graph to TensorRT engine

Run the scripts/convert_plan.py script from the root directory of the project, referencing the models table for relevant parameters. For example, to convert the Inception V1 model run the following

python scripts/convert_plan.py data/frozen_graphs/inception_v1.pb data/plans/inception_v1.plan input 224 224 InceptionV1/Logits/SpatialSqueeze 1 0 float

The inputs to the convert_plan.py script are

  1. frozen graph path
  2. output plan path
  3. input node name
  4. input height
  5. input width
  6. output node name
  7. max batch size
  8. max workspace size
  9. data type (float or half)

This script assumes single output single input image models, and may not work out of the box for models other than those in the table above.

Execute TensorRT engine

Call the examples/classify_image program from the root directory of the project, referencing the models table for relevant parameters. For example, to run the Inception V1 model converted as above

./build/examples/classify_image/classify_image data/images/gordon_setter.jpg data/plans/inception_v1.plan data/imagenet_labels_1001.txt input InceptionV1/Logits/SpatialSqueeze inception

For reference, the inputs to the example program are

  1. input image path
  2. plan file path
  3. labels file (one label per line, line number corresponds to index in output)
  4. input node name
  5. output node name
  6. preprocessing function (either vgg or inception)

We provide two image label files in the data folder. Some of the TensorFlow models were trained with an additional "background" class, causing the model to have 1001 outputs instead of 1000. To determine the number of outputs for each model, reference the NETS variable in scripts/model_meta.py.

Benchmark all models

To benchmark all of the models, first convert all of the models that you downloaded above into TensorRT engines. Run the following script to convert all models

python scripts/frozen_graphs_to_plans.py

If you want to change parameters related to TensorRT optimization, just edit the scripts/frozen_graphs_to_plans.py file. Next, to benchmark all of the models run the scripts/test_trt.py script

python scripts/test_trt.py

Once finished, the timing results will be stored at data/test_output_trt.txt. If you want to also benchmark the TensorFlow models, simply run.

python scripts/test_tf.py

The results will be stored at data/test_output_tf.txt. This benchmarking script loads an example image as input, make sure you have downloaded the sample images as above.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].