All Projects → Maratyszcza → Nnpack

Maratyszcza / Nnpack

Licence: bsd-2-clause
Acceleration package for neural networks on multi-core CPUs

Programming Languages

c
50402 projects - #5 most used programming language
C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
CMake
9771 projects
assembly
5116 projects
HTML
75241 projects

Projects that are alternatives of or similar to Nnpack

Xnnpack
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Stars: ✭ 808 (-47.46%)
Mutual labels:  multithreading, cpu, simd, neural-networks, inference
Object threadsafe
We make any object thread-safe and std::shared_mutex 10 times faster to achieve the speed of lock-free algorithms on >85% reads
Stars: ✭ 280 (-81.79%)
Mutual labels:  multithreading, high-performance
Thorin
The Higher-Order Intermediate Representation
Stars: ✭ 116 (-92.46%)
Mutual labels:  cpu, simd
Hipsycl
Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (-75.49%)
Mutual labels:  high-performance, high-performance-computing
MultiHttp
This is a high performance , very useful multi-curl tool written in php. 一个超级好用的并发CURL工具!!!(httpful,restful, concurrency)
Stars: ✭ 79 (-94.86%)
Mutual labels:  high-performance, multithreading
CPURasterizer
CPU Based Rasterizer Engine
Stars: ✭ 99 (-93.56%)
Mutual labels:  cpu, multithreading
Atomic queue
C++ lockless queue.
Stars: ✭ 373 (-75.75%)
Mutual labels:  multithreading, high-performance
BMW-IntelOpenVINO-Detection-Inference-API
This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.
Stars: ✭ 66 (-95.71%)
Mutual labels:  cpu, inference
Taskflow
A General-purpose Parallel and Heterogeneous Task Programming System
Stars: ✭ 6,128 (+298.44%)
Mutual labels:  multithreading, high-performance-computing
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (-48.44%)
Mutual labels:  neural-networks, high-performance-computing
Corium
Corium is a modern scripting language which combines simple, safe and efficient programming.
Stars: ✭ 18 (-98.83%)
Mutual labels:  high-performance, multithreading
Blis
BLAS-like Library Instantiation Software Framework
Stars: ✭ 859 (-44.15%)
Mutual labels:  high-performance, high-performance-computing
space
A SCI-FI community game server simulating space(ships). Built from the ground up to support moddable online action multiplayer and roleplay!
Stars: ✭ 25 (-98.37%)
Mutual labels:  high-performance, multithreading
Clojurecl
ClojureCL is a Clojure library for parallel computations with OpenCL.
Stars: ✭ 266 (-82.7%)
Mutual labels:  high-performance, high-performance-computing
Planeverb
Project Planeverb is a CPU based real-time wave-based acoustics engine for games. It comes with an integration with the Unity Engine.
Stars: ✭ 22 (-98.57%)
Mutual labels:  cpu, multithreading
Cppflow
Run TensorFlow models in C++ without installation and without Bazel
Stars: ✭ 357 (-76.79%)
Mutual labels:  neural-networks, inference
BMW-IntelOpenVINO-Segmentation-Inference-API
This is a repository for a semantic segmentation inference API using the OpenVINO toolkit
Stars: ✭ 31 (-97.98%)
Mutual labels:  cpu, inference
ThreadPinning.jl
Pinning Julia threads to cores
Stars: ✭ 23 (-98.5%)
Mutual labels:  multithreading, high-performance-computing
Seqan
SeqAn's official repository.
Stars: ✭ 386 (-74.9%)
Mutual labels:  simd, high-performance
Edge
Extreme-scale Discontinuous Galerkin Environment (EDGE)
Stars: ✭ 18 (-98.83%)
Mutual labels:  simd, high-performance-computing

NNPACK Logo

NNPACK

BSD (2 clause) License Build Status

NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs.

NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives leveraged in leading deep learning frameworks, such as PyTorch, Caffe2, MXNet, tiny-dnn, Caffe, Torch, and Darknet.

Platforms and requirements

Environment Architecture CPU requirements
Linux x86-64 AVX2 and 3-level cache hierarchy
Linux ARM NEON
Linux ARM64
macOS x86-64 AVX2 and 3-level cache hierarchy
Android ARM NEON
Android ARM64
Android x86
Android x86-64
iOS ARM
iOS ARM64
Emscripten Asm.js
Emscripten WebAssembly

Features

  • Multiple algorithms for convolutional layers:
    • Fast convolution based on Fourier transform (for kernels up to 16x16 without stride)
    • Fast convolution based on Winograd transform (for 3x3 kernels without stride)
    • Implicit matrix-matrix multiplication algorithm (no limitations)
    • Direct convolution algorithm (for 1x1 kernels without stride)
  • Multi-threaded SIMD-aware implementations of neural network layers
  • Implemented in C99 and Python without external dependencies
  • Extensive coverage with unit tests

Layers

  • Convolutional layer
    • Inference-optimized forward propagation (nnp_convolution_inference)
    • Training-optimized forward propagation (nnp_convolution_output)
    • Training-optimized backward input gradient update (nnp_convolution_input_gradient)
    • Training-optimized backward kernel gradient update (nnp_convolution_kernel_gradient)
  • Fully-connected layer
    • Inference-optimized forward propagation (nnp_fully_connected_inference and nnp_fully_connected_inference_f16f32 version for FP16 weights)
    • Training-optimized forward propagation (nnp_fully_connected_output)
  • Max pooling layer
    • Forward propagation, both for training and inference, (nnp_max_pooling_output)
  • ReLU layer (with parametrized negative slope)
    • Forward propagation, both for training and inference, optionally in-place, (nnp_relu_output)
    • Backward input gradient update (nnp_relu_input_gradient)
  • Softmax layer
    • Forward propagation, both for training and inference, optionally in-place (nnp_softmax_output)

Building

For most users, the recommended way to build NNPACK is through CMake:

mkdir build
cd build
cmake -G Ninja ..
ninja

Note: if ninja is not available on your system, configure without -G Ninja, and use make instead of ninja.

Cross-compilation for Android

To cross-compile for Android, add extra configuration options for cmake: -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake (where $ANDROID_NDK is the path to Android NDK directorory, e.g. /opt/android-ndk-r15c) AND arguments from the table below

ABI Extra cmake args Restrictions
armeabi -DANDROID_ABI=armeabi -DANDROID_TOOLCHAIN=gcc Requires CPU with ARM NEON
armeabi-v7a -DANDROID_ABI=armeabi-v7a -DANDROID_TOOLCHAIN=gcc Requires CPU with ARM NEON
arm64-v8a -DANDROID_ABI=arm64-v8a -DANDROID_TOOLCHAIN=clang Requires clang toolchain
x86 -DANDROID_ABI=x86
x86_64 -DANDROID_ABI=x86_64

Notes:

  • On armeabi and armeabi-v7a nnp_initialize will fail with nnp_status_unsupported_hardware if the mobile CPU does not support ARM NEON. Don't set -DANDROID_ARM_NEON=1 for NNPACK compilation as it can make nnp_initialize crash on CPUs without ARM NEON.
  • NNPACK builds for armeabi and armeabi-v7a are up to 2x slower if you use clang toolchain.
  • mips and mips64 are not supported, and we have no plans to add it (pull request would be welcome, though)
  • x86_64 build will use generic 128-bit (SSE2) micro-kernels rather than AVX2 micro-kernels in native build

Ecosystem

Deep Learning Frameworks

  • PyTorch supports NNPACK on mobile for inference in convolutional layers.
  • TVM supports NNPACK for inference in convolutional layers. See these instructions to enable NNPACK in TVM.
  • MXNet supports NNPACK for inference in convolutional layers, fully-connected, and max-pooling layers. See MXNet wiki for configuration instructions and performance benchmarks).
  • Caffe2 supports NNPACK for inference in convolutional layers.
  • darknet-nnpack - fork of Darknet framework with NNPACK support.
  • tiny-dnn - header-only deep learning framework in C++11, which natively supports NNPACK.
  • Maratyszcza/caffe - up-to-date integration of NNPACK (convolutional, fully-connected, max-pooling, and ReLU layers) into Caffe based on nnpack-pr branch in ajtulloch/caffe.
  • Maratyszcza/caffe-nnpack - older and unmaintained integration of NNPACK (convolutional layers only) into Caffe.
  • szagoruyko/nnpack.torch - integration of NNPACK into Lua Torch via ffi
  • See also discussion in Issue #1

Languages and Environments

Users

  • Facebook uses NNPACK in production.
  • Prisma uses NNPACK in the mobile app.

Acknowledgements

HPC Garage logo Georgia Tech College of Computing logo

The library is developed by Marat Dukhan of Georgia Tech with extensive advice from Nicolas Vasilache and Soumith Chintala of Facebook Artificial Intelligence Research. Andrew Tulloch of Facebook Artificial Intelligence Research contributed Caffe integration. We thank Andrew Lavin for fruitful discussions on Winograd transform-based implementations. NNPACK is a research project at Richard Vuduc's HPC Garage lab in the Georgia Institute of Technology, College of Computing, School of Computational Science and Engineering.

This material is based upon work supported by the U.S. National Science Foundation (NSF) Award Number 1339745. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of NSF.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].