Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → springer13 → Hptt

springer13 / Hptt

Licence: bsd-3-clause

High-Performance Tensor Transpose library

Labels

high-performance-computing tensor tensors multidimensional-arrays

Projects that are alternatives of or similar to Hptt

Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Stars: ✭ 793 (+462.41%)

Mutual labels: multidimensional-arrays, tensor, high-performance-computing

Laser

The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers

Stars: ✭ 191 (+35.46%)

Mutual labels: tensor, high-performance-computing

GenericTensor

The only library allowing to create Tensors (matrices extension) with custom types

Stars: ✭ 42 (-70.21%)

Mutual labels: tensor, tensors

Xtensor

C++ tensors with broadcasting and lazy computing

Stars: ✭ 2,453 (+1639.72%)

Mutual labels: multidimensional-arrays, tensors

tensority

Strongly typed multidimensional array library for OCaml

Stars: ✭ 44 (-68.79%)

Mutual labels: tensor, multidimensional-arrays

Fastor

A lightweight high performance tensor algebra framework for modern C++

Stars: ✭ 280 (+98.58%)

Mutual labels: multidimensional-arrays, tensors

Tensor

package tensor provides efficient and generic n-dimensional arrays in Go that are useful for machine learning and deep learning purposes

Stars: ✭ 222 (+57.45%)

Mutual labels: multidimensional-arrays, tensor

Awesome Tensor Compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

Stars: ✭ 490 (+247.52%)

Mutual labels: tensor, high-performance-computing

Array

C++ multidimensional arrays in the spirit of the STL

Stars: ✭ 123 (-12.77%)

Mutual labels: multidimensional-arrays, tensors

Pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Stars: ✭ 52,811 (+37354.61%)

Mutual labels: tensor

Aspect

A parallel, extensible finite element code to simulate convection in both 2D and 3D models.

Stars: ✭ 120 (-14.89%)

Mutual labels: high-performance-computing

Clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH

Stars: ✭ 106 (-24.82%)

Mutual labels: high-performance-computing

Cadabra2

A field-theory motivated approach to computer algebra.

Stars: ✭ 112 (-20.57%)

Mutual labels: tensors

Surrogates.jl

Surrogate modeling and optimization for scientific machine learning (SciML)

Stars: ✭ 121 (-14.18%)

Mutual labels: high-performance-computing

Dace

DaCe - Data Centric Parallel Programming

Stars: ✭ 106 (-24.82%)

Mutual labels: high-performance-computing

Tntorch

Tensor Network Learning with PyTorch

Stars: ✭ 133 (-5.67%)

Mutual labels: tensors

Tensorflow Gpu Macosx

Unoffcial NVIDIA CUDA GPU support version of Google Tensorflow for MAC OSX

Stars: ✭ 103 (-26.95%)

Mutual labels: tensor

Bqn

An APL-like programming language. Self-hosted!

Stars: ✭ 100 (-29.08%)

Mutual labels: multidimensional-arrays

Claymore

Stars: ✭ 135 (-4.26%)

Mutual labels: high-performance-computing

Batchtools

Tools for computation on batch systems

Stars: ✭ 127 (-9.93%)

Mutual labels: high-performance-computing

View All Similar Projects ➔

High-Performance Tensor Transpose library

HPTT is a high-performance C++ library for out-of-place tensor transpositions of the general form:

where A and B respectively denote the input and output tensor; represents the user-specified transposition, and and being scalars (i.e., setting != 0 enables the user to update the output tensor B).

Key Features

Multi-threading support
Explicit vectorization
Auto-tuning (akin to FFTW)
- Loop order
- Parallelization
Multi architecture support
- Explicitly vectorized kernels for (AVX and ARM)
Supports float, double, complex and double complex data types
Supports both column-major and row-major data layouts

HPTT now also offers C- and Python-interfaces (see below).

Requirements

You must have a working C++ compiler with c++11 support. I have tested HPTT with:

Intel's ICPC 15.0.3, 16.0.3, 17.0.2
GNU g++ 5.4, 6.2, 6.3
clang++ 3.8, 3.9

Install

Clone the repository into a desired directory and change to that location:

git clone https://github.com/springer13/hptt.git
cd hptt
export CXX=<desired compiler>

Now you have several options to build the desired version of the library:

make avx
make arm
make scalar

Using CMake: mkdir build && cd build cmake .. -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ #Optionally one of [-DENABLE_ARM=ON -DENABLE_AVX=ON -DENABLE_IBM=ON]

This should create 'libhptt.so' inside the ./lib folder.

Getting Started

Please have a look at the provided benchmark.cpp.

In general HPTT is used as follows:

#include <hptt.h>

// allocate tensors
float A* = ...
float B* = ...

// specify permutation and size
int dim = 6;
int perm[dim] = {5,2,0,4,1,3};
int size[dim] = {48,28,48,28,28};

// create a plan (shared_ptr)
auto plan = hptt::create_plan( perm, dim, 
                               alpha, A, size, NULL, 
                               beta,  B, NULL, 
                               hptt::ESTIMATE, numThreads);

// execute the transposition
plan->execute();

The example above does not use any auto-tuning, but solely relies on HPTT's performance model. To active auto-tuning, please use hptt::MEASURE, or hptt::PATIENT instead of hptt::ESTIMATE.

C-Interface

HPTT also offeres a C-interface. This interface is less expressive than its C++ counter part since it does not expose control over the plan.

void sTensorTranspose( const int *perm, const int dim,
        const float alpha, const float *A, const int *sizeA, const int *outerSizeA, 
        const float beta,        float *B,                   const int *outerSizeB, 
        const int numThreads, const int useRowMajor);

void dTensorTranspose( const int *perm, const int dim,
        const double alpha, const double *A, const int *sizeA, const int *outerSizeA, 
        const double beta,        double *B,                   const int *outerSizeB, 
        const int numThreads, const int useRowMajor);
...

Python-Interface

HPTT now also offers a python-interface. The functionality offered by HPTT is comparable to numpy.transpose with the difference being that HPTT can also update the output tensor.

tensorTransposeAndUpdate( perm, alpha, A, beta, B, numThreads=-1)

tensorTranspose( perm, alpha, A, numThreads=-1)

See docstring for additional information. Based on those there are also the following drop-in replacements for numpy functions:

hptt.transpose(A, axes)
hptt.ascontiguousarray(A)
hptt.asfortranarray(A)

Installation should be straight forward via:

cd ./pythonAPI
python setup.py install

pip install -U .

if you want a pip managed install. At this point you should be able to import the 'hptt' package within your python scripts.

The python interface also offers support for:

Single and double precision
Column-major and row-major data layouts
multi-threading support (HPTT by default utilizes all cores of a system)

Python Benchmark

You can find an elaborate example under ./pythonAPI/benchmark/benchmark.py --help

Multi-threaded 2x Intel Haswell-EP E5-2680 v3 (24 threads)
- Comparison again numpy.transpose

Documentation

You can generate the doxygen documentation via

make doc

Benchmark

The benchmark is the same as the original TTC benchmark benchmark for tensor transpositions.

You can compile the benchmark via:

cd benchmark
make

Before running the benchmark, please modify the number of threads and the thread affinity within the benchmark.sh file. To run the benchmark just use:

./benshmark.sh

This will create hptt_benchmark.dat file containing all the runtime information of HPTT and the reference implementation.

Performance Results

See (pdf) for details.

TODOs

Add explicit vectorization for IBM power
Add explicit vectorization for complex types

Related Projects

Shared-Memory Tensor Contractions:
- TCL
- TBLIS
Distributed-Memory Tensor Contractions:
- CTF
- libtensor
Tensor network codes:
- ITensor
- Uni10

Citation

In case you want refer to HPTT as part of a research paper, please cite the following article (pdf):

@inproceedings{hptt2017,
 author = {Springer, Paul and Su, Tong and Bientinesi, Paolo},
 title = {{HPTT}: {A} {H}igh-{P}erformance {T}ensor {T}ransposition {C}++ {L}ibrary},
 booktitle = {Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming},
 series = {ARRAY 2017},
 year = {2017},
 isbn = {978-1-4503-5069-3},
 location = {Barcelona, Spain},
 pages = {56--62},
 numpages = {7},
 url = {http://doi.acm.org/10.1145/3091966.3091968},
 doi = {10.1145/3091966.3091968},
 acmid = {3091968},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {High-Performance Computing, autotuning, multidimensional transposition, tensor transposition, tensors, vectorization},
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 141

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗