All Projects → inoryy → Tensorflow Optimized Wheels

inoryy / Tensorflow Optimized Wheels

Licence: mit
TensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tensorflow Optimized Wheels

Ctranslate2
Fast inference engine for OpenNMT models
Stars: ✭ 140 (+18.64%)
Mutual labels:  cudnn, cuda, avx2
Nvidia libs test
Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (-69.49%)
Mutual labels:  cudnn, cuda
Simple Sh Datascience
A collection of Bash scripts and Dockerfiles to install data science Tool, Lib and application
Stars: ✭ 32 (-72.88%)
Mutual labels:  cudnn, cuda
Sse4 Strstr
SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
Stars: ✭ 115 (-2.54%)
Mutual labels:  sse, avx2
Tensorflow Object Detection Tutorial
The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
Stars: ✭ 113 (-4.24%)
Mutual labels:  cudnn, cuda
Quadray Engine
Realtime raytracer using SIMD on ARM, MIPS, PPC and x86
Stars: ✭ 13 (-88.98%)
Mutual labels:  sse, avx2
Sixtyfour
How fast can we brute force a 64-bit comparison?
Stars: ✭ 41 (-65.25%)
Mutual labels:  cuda, avx2
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+572.03%)
Mutual labels:  cudnn, cuda
Singularity Tutorial
Tutorial for using Singularity containers
Stars: ✭ 46 (-61.02%)
Mutual labels:  cudnn, cuda
Unisimd Assembler
SIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (-46.61%)
Mutual labels:  sse, avx2
Base64simd
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Stars: ✭ 115 (-2.54%)
Mutual labels:  sse, avx2
Imagenet Classifier Tensorflow
Image recognition and classification using Convolutional Neural Networks with TensorFlow
Stars: ✭ 13 (-88.98%)
Mutual labels:  cudnn, cuda
Directxmath
DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Stars: ✭ 859 (+627.97%)
Mutual labels:  sse, avx2
Libsimdpp
Portable header-only C++ low level SIMD library
Stars: ✭ 914 (+674.58%)
Mutual labels:  sse, avx2
Wheels
Performance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+655.08%)
Mutual labels:  cuda, avx2
Vc
SIMD Vector Classes for C++
Stars: ✭ 985 (+734.75%)
Mutual labels:  sse, avx2
Aurora
Minimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.
Stars: ✭ 90 (-23.73%)
Mutual labels:  cudnn, cuda
Cupy
NumPy & SciPy for GPU
Stars: ✭ 5,625 (+4666.95%)
Mutual labels:  cudnn, cuda
Chainer
A flexible framework of neural networks for deep learning
Stars: ✭ 5,656 (+4693.22%)
Mutual labels:  cudnn, cuda
Simde
Implementations of SIMD instruction sets for systems which don't natively support them.
Stars: ✭ 1,012 (+757.63%)
Mutual labels:  sse, avx2

Optimized TensorFlow Wheels

If you see similar messages when you start TensorFlow then these wheels are for you!

The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.

Introduction

The builds enable various performance flags targeting modern CPUs, including SIMD support (AVX2, SSE4, FMA). If you have a CPU released after ~2013 then you'll likely benefit from these on e.g. data pre-processing.

Build also enables XLA - an Accelerated Linear Algebra domain-specific just-in-time compiler.

Additional compute capabilities (5.0, 6.1, 7.0, 7.5) are enabled, meaning the wheels should work well on a wide range of GPUS: from GTX 7xx to RTX 20xx families.

Available Wheels

TensorFlow Python CUDA CuDNN TensorRT NCCL Compute Capability OS Link
2.1.0 3.8 10.2 7.6 7.0 2.5 5.0,6.1,7.0,7.5 Linux tensorflow-2.1.0-cp38-cp38-linux_x86_64.whl
2.1.0 3.7 10.2 7.6 7.0 2.5 5.0,6.1,7.0,7.5 Linux tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl
2.0.0 3.8 10.2 7.6 N/A 2.5 5.0,6.1,7.0 Linux tensorflow-2.0.0-cp38-cp38-linux_x86_64.whl
2.0.0 3.7 10.1 7.5 N/A 2.4 5.0,6.1,7.0 Linux tensorflow-2.0.0-cp37-cp37m-linux_x86_64.whl

Installation

Assuming you have all the requirements, you can install the wheel directly via pip:

pip install https://github.com/inoryy/tensorflow-optimized-wheels/releases/download/v2.1.0/tensorflow-2.1.0-cp37-cp37m-linux_x86_64.whl

And verify the installation (notice no warning messages):

$ python
Python 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38)
[GCC 7.3.0] :: Anaconda, Inc. on linux
>>> import tensorflow as tf
I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
>>> tf.__version__
'2.1.0'
>>> tf.executing_eagerly()
True
>>> tf.constant([123]) + tf.constant([321])
I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
...
I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (...) -> physical GPU (...)
<tf.Tensor: shape=(1,), dtype=int32, numpy=array([444], dtype=int32)>

Benchmark

The wheels are benchmarked by training an MNIST model from TF Models on a CPU. Results for TF 2.1 are as follows:

Build / Time Per Epoch Mean Min Max
Official 16.7s 16s 19s
Optimized 14.3s 12s 17s

Requests

If you need a different TensorFlow / CUDA / CuDNN / Python combination feel free to open a GitHub ticket.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].