All Projects → osai-ai → Tensor Stream

osai-ai / Tensor Stream

Licence: lgpl-2.1
A library for real-time video stream decoding to CUDA memory

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tensor Stream

Caer
High-performance Vision library in Python. Scale your research, not boilerplate.
Stars: ✭ 452 (+63.18%)
Mutual labels:  video-processing, cuda
Dokai
Collection of Docker images for ML/DL and video processing projects
Stars: ✭ 58 (-79.06%)
Mutual labels:  video-processing, cuda
Pynvvl
A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-65.7%)
Mutual labels:  video-processing, cuda
nebula
Media asset management and broadcast automation system
Stars: ✭ 103 (-62.82%)
Mutual labels:  video-processing
docker python-opencv-ffmpeg
Dockerfile containing FFmpeg, OpenCV4 and Python2/3, based on Ubuntu LTS
Stars: ✭ 38 (-86.28%)
Mutual labels:  cuda
Kinectfusionlib
Implementation of the KinectFusion approach in modern C++14 and CUDA
Stars: ✭ 261 (-5.78%)
Mutual labels:  cuda
Fbcuda
Facebook's CUDA extensions.
Stars: ✭ 275 (-0.72%)
Mutual labels:  cuda
tssi2
tssi2 is a header-only library for parsing MPEG-2 and DVB Transport Streams in the domain of multimedia processing applications.
Stars: ✭ 18 (-93.5%)
Mutual labels:  video-processing
Learn Cuda Programming
Learn CUDA Programming, published by Packt
Stars: ✭ 271 (-2.17%)
Mutual labels:  cuda
Popsift
PopSift is an implementation of the SIFT algorithm in CUDA.
Stars: ✭ 259 (-6.5%)
Mutual labels:  cuda
instant-ngp
Instant neural graphics primitives: lightning fast NeRF and more
Stars: ✭ 1,863 (+572.56%)
Mutual labels:  cuda
cuda-cmake-gtest-gbench-starter
A cross-platform CUDA/C++14 starter project with google test and google benchmark support.
Stars: ✭ 24 (-91.34%)
Mutual labels:  cuda
Dynamicfusion
Implementation of Newcombe et al. CVPR 2015 DynamicFusion paper
Stars: ✭ 267 (-3.61%)
Mutual labels:  cuda
eloquent-ffmpeg
High-level API for FFmpeg's Command Line Tools
Stars: ✭ 71 (-74.37%)
Mutual labels:  video-processing
Optimizedimageenhance
Several image/video enhancement methods, implemented by Java, to tackle common tasks, like dehazing, denoising, backscatter removal, low illuminance enhancement, featuring, smoothing and etc.
Stars: ✭ 272 (-1.81%)
Mutual labels:  video-processing
Torch-TensorRT
PyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT
Stars: ✭ 1,216 (+338.99%)
Mutual labels:  cuda
Go Cyber
Your 🔵 Superintelligence
Stars: ✭ 270 (-2.53%)
Mutual labels:  cuda
gpu-monitor
Script to remotely check GPU servers for free GPUs
Stars: ✭ 85 (-69.31%)
Mutual labels:  cuda
LuisaRender
High-Performance Multiple-Backend Renderer Based on LuisaCompute
Stars: ✭ 47 (-83.03%)
Mutual labels:  cuda
Brainsimulator
Brain Simulator is a platform for visual prototyping of artificial intelligence architectures.
Stars: ✭ 262 (-5.42%)
Mutual labels:  cuda

TensorStream

TensorStream is a C++ library for real-time video streams (e.g., RTMP) decoding to CUDA memory which supports some additional features:

  • CUDA memory conversion to ATen Tensor for using it via Python in PyTorch Deep Learning models
  • Detecting basic video stream issues related to frames reordering/loss
  • Video Post Processing (VPP) operations: downscaling/upscaling, crops, color conversions, etc.

The library supports both Linux and Windows.

Simple example how to use TensorStream for deep learning tasks:

from tensor_stream import TensorStreamConverter, FourCC, Planes

reader = TensorStreamConverter("rtmp://127.0.0.1/live", cuda_device=0)
reader.initialize()
reader.start()

while need_predictions:
    # read the latest available frame from the stream
    tensor = reader.read(pixel_format=FourCC.BGR24,
                         width=256,                 # resize to 256x256 px
                         height=256,
                         normalization=True,        # normalize to range [0, 1]
                         planes_pos=Planes.PLANAR)  # dimension order [C, H, W]

    # tensor dtype is torch.float32, device is 'cuda:0', shape is (3, 256, 256)
    prediction = model(tensor.unsqueeze(0))
  • Initialize tensor stream with a video (e.g., a local file or a network video stream) and start reading it in a separate process.

  • Get the latest available frame from the stream and make a prediction.

Note: All tasks inside TensorStream processed on a GPU, so the output tensor is also located on the GPU.

Table of Contents

Install TensorStream

Dependencies

  • NVIDIA CUDA 9.0 or above
  • FFmpeg and FFmpeg version of headers required to interface with Nvidias codec APIs nv-codec-headers
  • PyTorch 1.1.0 or above to build C++ extension for Python
  • Python 3.6 or above to build C++ extension for Python

It is convenient to use TensorStream in Docker containers. The provided Dockerfiles is supplied to create an image with all the necessary dependencies.

Installation from source

TensorStream source code

git clone -b master --single-branch https://github.com/Fonbet/argus-tensor-stream.git
cd argus-tensor-stream

C++ extension for Python

On Linux:

python setup.py install

On Windows:

set FFMPEG_PATH="Path to FFmpeg install folder"
set path=%path%;%FFMPEG_PATH%\bin
set VS150COMNTOOLS="Path to Visual Studio vcvarsall.bat folder"
call "%VS150COMNTOOLS%\vcvarsall.bat" x64 -vcvars_ver=14.11
python setup.py install

To build TensorStream on Windows, Visual Studio 2017 14.11 toolset is required

C++ library:

On Linux:

mkdir build
cd build
cmake ..

On Windows:

set FFMPEG_PATH="Path to FFmpeg install folder"
mkdir build
cd build
cmake -G "Visual Studio 15 2017 Win64" -T v141,version=14.11 ..

Binaries (Linux only)

Extension for Python can be installed via pip:

  • CUDA 9:

Warning: CUDA 9 isn't supported by TensorStream anymore so new releases won't be built and distributed in binary format.

  • CUDA 10: TensorStream compiled with different versions of Pytorch:
pip install https://tensorstream.argus-ai.com/wheel/cu10/torch1.4.0/linux/tensor_stream-0.4.0-cp36-cp36m-linux_x86_64.whl
pip install https://tensorstream.argus-ai.com/wheel/cu10/torch1.5.0/linux/tensor_stream-0.4.0-cp36-cp36m-linux_x86_64.whl

Building examples and tests

Examples for Python and C++ can be found in c_examples and python_examples folders. Tests for C++ can be found in tests folder.

Python example

Can be executed via Python after TensorStream C++ extension for Python installation.

cd python_examples
python simple.py

C++ example and unit tests

On Linux

cd c_examples  # tests
mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=$PWD/../../cmake ..

On Windows

set FFMPEG_PATH="Path to FFmpeg install folder"
cd c_examples or tests
mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=%cd%\..\..\cmake -G "Visual Studio 15 2017 Win64" -T v141,version=14.11 ..

Docker image

To build TensorStream need to pass Pytorch version via TORCH_VERSION argument:

docker build --build-arg TORCH_VERSION=1.5.0 -t tensorstream .

Run with a bash command line and follow the installation guide

nvidia-docker run -ti tensorstream bash

Note: GPU support was added to new version of Docker (tested with Docker version 19.03.1), so instead of nvidia-docker run command above need to execute:

docker run --gpus=all -ti tensorstream bash

Usage

Samples

  1. Simple example demonstrates RTMP to PyTorch tensor conversion. Let's consider some usage scenarios:

Note: You can pass --help to get the list of all available options, their description and default values

  • Convert an RTMP bitstream to RGB24 PyTorch tensors and dump the result to a dump.yuv file:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -o dump

Warning: Dumps significantly affect performance. Suffix .yuv will be added to the output filename.

  • The same scenario with downscaling with nearest resize algorithm:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 --resize_type NEAREST -o dump

Note: Besides nearest resize algorithm, bilinear, bicubic and area (similar to OpenCV INTER_AREA) algorithms are available.

Warning: Resize algorithms applied to NV12, so b2b with popular frameworks, which perform resize on other than NV12 format, aren't guaranteed.

  • Number of frames to process can be limited by -n option:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100
  • The result file can be cropped via --crop option which takes coordinates of left top and right bottom corners as parameters:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 --crop 0,0,320,240 -o dump -n 100

Warning: Crop is applied before resize algorithm.

  • Output pixels format can be either torch.float32 or torch.uint8 depending on normalization option which can be True, False or not set so TensorStream will decide which value should be used:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100 --normalize True
  • Color planes in case of RGB can be either planar or merged and can be set via --planes option:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100 --planes MERGED
  • Buffer size of processed frames via -bs or --buffer_size option:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100 --planes MERGED --buffer_size 5

Warning: Buffer size should be less or equal to decoded picture buffer (DPB)

  • GPU used for execution can be set via --cuda_device option:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100 --planes MERGED --cuda_device 0
  • Input stream reading mode can be chosen with --framerate_mode option. Check help to find available values and description:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100 --planes MERGED --framerate_mode NATIVE
  • Bitstream analyze stage can be skipped to decrease latency with --skip_analyze flag:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100 --planes MERGED --skip_analyze
  • Timeout for input frame reading can be set via --timeout option (time in seconds):
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100 --planes MERGED --timeout 2
  • Logs types and levels can be configured with -v, -vd and --nvtx options. Check help to find available values and description:
python simple.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -fc RGB24 -w 720 -h 480 -o dump -n 100 --planes MERGED -v HIGH -vd CONSOLE --nvtx
  1. Example demonstrates how to use TensorStream in case of several stream consumers:
python many_consumers.py -i rtmp://37.228.119.44:1935/vod/big_buck_bunny.mp4 -n 100
  1. Example demonstrates how to use TensorStream if several streams should be handled simultaneously:
python different_streams.py -i1 <path-to-first-stream> -i2 <path-to-second-stream> -n1 100 -n2 50 -v1 LOW -v2 HIGH --cuda_device1 0 --cuda_device2 1

Warning: Default path to second stream is relative, so need to run different_streams.py from parent folder if no arguments are passing

PyTorch example

Real-time video style transfer example: fast-neural-style.

Documentation

Documentation for Python and C++ API can be found on the site.

License

TensorStream is LGPL-2.1 licensed, see the LICENSE file for details.

Used materials in samples

Big Buck Bunny is licensed under the Creative Commons Attribution 3.0 license. (c) copyright 2008, Blender Foundation / www.bigbuckbunny.org

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].