All Projects → giaf → Blasfeo

giaf / Blasfeo

Licence: other
Basic linear algebra subroutines for embedded optimization

Programming Languages

assembly
5116 projects

Projects that are alternatives of or similar to Blasfeo

Blis
BLAS-like Library Instantiation Software Framework
Stars: ✭ 859 (+615.83%)
Mutual labels:  linear-algebra, blas, high-performance
lubeck
High level linear algebra library for Dlang
Stars: ✭ 57 (-52.5%)
Mutual labels:  high-performance, linear-algebra, blas
sblas
Scala Native BLAS (Basic Linear Algebra Subprograms) supporting Linux and macOS
Stars: ✭ 25 (-79.17%)
Mutual labels:  linear-algebra, blas
optimath
A #[no_std] LinAlg library
Stars: ✭ 47 (-60.83%)
Mutual labels:  linear-algebra, blas
Kokkos Kernels
Kokkos C++ Performance Portability Programming EcoSystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Stars: ✭ 113 (-5.83%)
Mutual labels:  linear-algebra, blas
Lacaml
OCaml bindings for BLAS/LAPACK (high-performance linear algebra Fortran libraries)
Stars: ✭ 101 (-15.83%)
Mutual labels:  linear-algebra, blas
mfi
Modern Fortran Interfaces to BLAS and LAPACK
Stars: ✭ 31 (-74.17%)
Mutual labels:  linear-algebra, blas
monolish
monolish: MONOlithic LInear equation Solvers for Highly-parallel architecture
Stars: ✭ 166 (+38.33%)
Mutual labels:  linear-algebra, blas
Libflame
High-performance object-based library for DLA computations
Stars: ✭ 197 (+64.17%)
Mutual labels:  linear-algebra, high-performance
Vectorious
Linear algebra in TypeScript.
Stars: ✭ 616 (+413.33%)
Mutual labels:  linear-algebra, blas
Armadillo Code
Armadillo: fast C++ library for linear algebra & scientific computing - http://arma.sourceforge.net
Stars: ✭ 388 (+223.33%)
Mutual labels:  linear-algebra, blas
Fmatvec
A fast vector/matrix library
Stars: ✭ 5 (-95.83%)
Mutual labels:  linear-algebra, blas
linnea
Linnea is an experimental tool for the automatic generation of optimized code for linear algebra problems.
Stars: ✭ 60 (-50%)
Mutual labels:  linear-algebra, blas
fml
Fused Matrix Library
Stars: ✭ 24 (-80%)
Mutual labels:  linear-algebra, blas
Blasjs
Pure Javascript manually written 👌 implementation of BLAS, Many numerical software applications use BLAS computations, including Armadillo, LAPACK, LINPACK, GNU Octave, Mathematica, MATLAB, NumPy, R, and Julia.
Stars: ✭ 241 (+100.83%)
Mutual labels:  linear-algebra, blas
dbcsr
DBCSR: Distributed Block Compressed Sparse Row matrix library
Stars: ✭ 65 (-45.83%)
Mutual labels:  linear-algebra, blas
Eigen Git Mirror
THIS MIRROR IS DEPRECATED -- New url: https://gitlab.com/libeigen/eigen
Stars: ✭ 1,659 (+1282.5%)
Mutual labels:  linear-algebra, blas
Cython Blis
💥 Fast matrix-multiplication as a self-contained Python library – no system dependencies!
Stars: ✭ 165 (+37.5%)
Mutual labels:  linear-algebra, blas
MatlabJuliaMatrixOperationsBenchmark
Benchmark MATLAB & Julia for Matrix Operations
Stars: ✭ 21 (-82.5%)
Mutual labels:  linear-algebra, blas
Xtensor Blas
BLAS extension to xtensor
Stars: ✭ 102 (-15%)
Mutual labels:  linear-algebra, blas

BLASFEO - BLAS For Embedded Optimization

BLASFEO provides a set of basic linear algebra routines, performance-optimized for matrices that fit in cache (i.e. generally up to a couple hundred size in each dimension), as typically encountered in embedded optimization applications.

BLASFEO APIs

BLASFEO provides two APIs (Application Programming Interfaces):

  • BLAS API: the standard BLAS and LAPACK APIs, with matrices stored in column-major.
  • BLASFEO API: BLASFEO's own API is optimized to reduce overhead for small matrices. It employes structures to describe matrices (blasfeo_dmat) and vectors (blasfeo_dvec), defined in include/blasfeo_common.h. The actual implementation of blasfeo_dmat and blasfeo_dvec depends on the TARGET, LA (Linear Algebra) and MF (Matrix Format) choice. The API is non-destructive, and compared to the BLAS API it has an additional matrix/vector argument reserved for the output.
API Optimized (level 3) routines
BLASFEO
(small matrices)
dgemm, dsyrk, dtrmm, dtrsm, dpotrf, dgetrf, dgeqrf, dgelqf,
sgemm, ssyrk, strmm, strsm, spotrf
BLAS
(small matrices)
dgemm, dsyrk, dtrmm, dtrsm, dpotrf, dgetrf
sgemm, strsm, spotrf
BLAS
(large matrices)
dgemm

Note: BLASFEO is currently under active development. Some of the routines listed in the previous table may only be optimized for some variants, and provide reference implementations for other variants.

Supported Computer Architectures

The architecture for BLASFEO to use is specified using the TARGET build variable. Currently BLASFEO supports the following architectures:

TARGET Description
X64_INTEL_HASWELL Intel Haswell, Intel Skylake, AMD Zen, AMD Zen2 architectures or newer. x86_64 with AVX2 and FMA ISA, 64-bit OS
X64_INTEL_SANDY_BRIDGE Intel Sandy-Bridge architecture. x86_64 with AVX ISA, 64-bit OS
X64_INTEL_CORE Intel Core architecture. x86_64 with SSE3 ISA, 64-bit OS
X64_AMD_BULLDOZER AMD Bulldozer architecture. x86_64 with AVX and FMA ISAs, 64-bit OS
X86_AMD_JAGUAR AMD Jaguar architecture. x86 with AVX ISA, 32-bit OS
X86_AMD_BARCELONA AMD Barcelona architecture. x86 with SSE3 ISA, 32-bit OS
ARMV8A_ARM_CORTEX_A76 ARM Cortex 76 architecture or newer. ARMv8A with VFPv4 and NEONv2 ISAs, 64-bit OS
ARMV8A_ARM_CORTEX_A73 ARM Cortex 73 architecture or newer. ARMv8A with VFPv4 and NEONv2 ISAs, 64-bit OS
ARMV8A_ARM_CORTEX_A57 ARM Cortex A57, A72 architectures. ARMv8A with VFPv4 and NEONv2 ISAs, 64-bit OS
ARMV8A_ARM_CORTEX_A55 ARM Cortex A55 architecture. ARMv8A with VFPv4 and NEONv2 ISAs, 64-bit OS
ARMV8A_ARM_CORTEX_A53 ARM Cortex A53 architecture. ARMv8A with VFPv4 and NEONv2 ISAs, 64-bit OS
ARMV7A_ARM_CORTEX_A15 ARM Cortex A15 architecture. ARMv7A with VFPv4 and NEON ISAs, 32-bit OS
ARMV7A_ARM_CORTEX_A9 ARM Cortex A9 architecture. ARMv7A with VFPv3 and NEON ISAs, 32-bit OS
ARMV7A_ARM_CORTEX_A7 ARM Cortex A7 architecture. ARMv7A with VFPv4 and NEON ISAs, 32-bit OS
GENERIC Generic target, coded in C, giving better performance if the architecture provides more than 16 scalar FP registers (e.g. many RISC such as ARM)

Note that the ARMV8A_ARM_CORTEX_A76, ARMV8A_ARM_CORTEX_A73, ARMV8A_ARM_CORTEX_A55, X86_AMD_JAGUAR and X86_AMD_BARCELONA architectures are not currently supported by the CMake build system and can only be used through the included Makefile.

Automatic Target Detection

When using the CMake build system, it is possible to automatically detect the X64 target the current computer can use. This can be enabled by specifying the X64_AUTOMATIC target. In this mode, the build system will automatically search through the X64 targets to find the best one that can both compile and run on the host machine.

Target Testing

When using the CMake build system, tests will automatically be performed to see if the current compiler can compile the needed code for the selected target and that the current computer can execute the code compiled for the current target. The execution test can be disabled by setting the BLASFEO_CROSSCOMPILING flag to true. This is automatically done when CMake detects that cross compilation is happening.

Linear Algebra Routines

The BLASFEO backend provides three possible implementations of each linear algebra routine, specified using the LA build variable:

LA Description
HIGH_PERFORMANCE Target-tailored; performance-optimized for cache resident matrices; panel- or column-major matrix format. Currently provided for OS_LINUX (x86_64 64-bit, x86 32-bit, ARMv8A 64-bit, ARMv7A 32-bit), OS_WINDOWS (x86_64 64-bit) and OS_MAC (x86_64 64-bit).
REFERENCE Target-unspecific lightly-optimizated; small code footprint; panel- or column-major matrix format
EXTERNAL_BLAS_WRAPPER Call to external BLAS and LAPACK libraries; column-major matrix format

Matrix Formats

Currently there are two matrix formats used in the BLASFEO matrix structures blasfeo_dmat and blasfeo_smat, specified using the MF build variable: | MF | Description | | -------------- | ----------- | | COLMAJ | column-major (or FORTRAN-style): the standard matrix format used in the BLAS and LAPACK libraries | | PANELMAJ | panel-major: BLASFEO's own matrix format, which is designed to improve performance for matrices fitting in cache. Each matrix is stored in block-row-major with blocks (called panels) of fixed height, and within each panel the matrix elements are stored in column-major. |

Tests

BLASFEO provides some functionality to test the correctness of its linear algebra routines, for both the BLASFEO and the BLAS APIs. The testing framework is written in python (minimum version 3.6) and uses jinja template engine, which can be installed with the command pip install jinja2. In the tests folder there are several predefined test sets targeting different combinations of architecture, precision and matrix format, and which are used for automatic testing in Travis CI.

In order to run a test set, from the tests folder run for example the command

python tester.py testset_travis_blasfeo_pm_double_amd64.json

where you can replace the testset with any other. If no test set is specified, the testset_default.json is selected; this testset can be easily edited to test just a few routines of your choice.

Recommended guidelines

Some general guidelines to install BLASFEO, maximise its performance and avoid known performance issues can be found in the file guidelines.md.
Covered topics:

  • installation tips on Android
  • denormals
  • memory alignment

More Information

More information can be found on the BLASFEO wiki at https://blasfeo.syscop.de, including more detailed installation instructions, examples, and a rich collection of benchmarks and comparisions.

More scientific information can be found in:

Notes

  • BLASFEO is released under the 2-Clause BSD License.

  • 06-01-2018: BLASFEO employs now a new naming convention. The bash script change_name.sh can be used to automatically change the source code of any software using BLASFEO to adapt it to the new naming convention.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].