Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.

Stars: ✭ 142 (+79.75%)

Mutual labels: simd

mir-glas

[Experimental] LLVM-accelerated Generic Linear Algebra Subprograms

Stars: ✭ 99 (+25.32%)

Mutual labels: simd

ultra-sort

DSL for SIMD Sorting on AVX2 & AVX512

Stars: ✭ 29 (-63.29%)

Mutual labels: simd

penguinV

Simple and fast C++ image processing library with focus on heterogeneous systems

Stars: ✭ 110 (+39.24%)

Mutual labels: simd

runtime

AnyDSL Runtime Library

Stars: ✭ 17 (-78.48%)

Mutual labels: simd

SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development Version)

Stars: ✭ 74 (-6.33%)

Mutual labels: simd

hpc

Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )

Stars: ✭ 39 (-50.63%)

Mutual labels: simd

Singeli

High-level interface for low-level programming

Stars: ✭ 31 (-60.76%)

Mutual labels: simd

xor

Move to: https://github.com/templexxx/xorsimd

Stars: ✭ 27 (-65.82%)

Mutual labels: simd

pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!

Stars: ✭ 38 (-51.9%)

Mutual labels: simd

qHilbert

qHilbert is a vectorized speedup of Hilbert curve generation using SIMD intrinsics

Stars: ✭ 22 (-72.15%)

Mutual labels: simd

linear-vs-binary-search

Comparing linear and binary searches

Stars: ✭ 28 (-64.56%)

Mutual labels: simd

std find simd

std::find simd version

Stars: ✭ 19 (-75.95%)

Mutual labels: simd

SIMDArray

SIMD enhanced Array operations

Stars: ✭ 123 (+55.7%)

Mutual labels: simd

oversimple

A library for audio oversampling, which tries to offer a simple api while wrapping HIIR, by Laurent De Soras, for minimum phase antialiasing, and r8brain-free-src, by Aleksey Vaneev, for linear phase antialiasing.

Stars: ✭ 25 (-68.35%)

Mutual labels: simd

Quickenshtein

Making the quickest and most memory efficient implementation of Levenshtein Distance with SIMD and Threading support

Stars: ✭ 204 (+158.23%)

Mutual labels: simd

ParPar

High performance PAR2 create client for NodeJS

Stars: ✭ 110 (+39.24%)

Mutual labels: error-correcting-codes

simdjson-rust

Rust bindings for the simdjson project.

Stars: ✭ 59 (-25.32%)

Mutual labels: simd

SoftLight

A shader-based Software Renderer Using The LightSky Framework.

Stars: ✭ 2 (-97.47%)

Mutual labels: simd

sse-avx-rasterization

Triangle rasterization routines accelerated by SSE and AVX

Stars: ✭ 53 (-32.91%)

Mutual labels: simd

View All Similar Projects ➔

NO LONGER MAINTAINED
========================

The same functionality as this library is available in Botan
(https://github.com/randombit/botan) starting in 3.0.0

THE OLD README ->
===================

FECpp: Erasure codes based on Vandermonde matrices

FECpp contains an implementation of an encoder/decoder for an erasure
code based on Vandermonde matrices computed over GF(2^8). It is based
on fec, by Luigi Rizzo, which is available at
  http://info.iet.unipi.it/~luigi/fec.html

FECpp should be compatible with zfec (http://allmydata.org/trac/zfec),
producing bitwise identical results in all cases.

Principle of Operation
========================================

The encoded data is computed as

	y = E x

where x is a k-vector with source data, y is an n-vector with the
redundant info, and E is an n*k matrix derived from a Vandermonde
matrix. The code is systematic, meaning the first k rows of E are the
identity matrix. This causes the first k blocks of output to be equal
to the input, split into k pieces.

At the receiver, any subset y' of k elements from y allows the
reconstruction of the whole x by solving the system

	y' = E' x

where E' is made of rows from E corresponding to the received
elements.

The complexity of matrix inversion is O(k*l^2) where l is the number
of elements not in x available at the receiver. This might seem
large, but data elements are in fact be packets of large size, so
the inversion cost can be amortized over the size of the packet.
For practical applications (k and l as large as 30, packet sizes
of 1KB) the cost can be neglected.

API Usage
========================================

fecpp provides a single class, fec_code, which is declared in the
header file fecpp.h

fec_code's constructor takes two integers, k and n. The encoder will
generate n shares, any k of which will be sufficient to recover the
original input.

To encode, call fec_code's encode operation with a pointer to a
buffer, a length, and a std::function which will be called for each
output block:

void encode(const byte input[], size_t size,
   std::function<void (size_t, size_t, const byte[], size_t)> out) const

The length of the input must be a multiple of k bytes.

The arguments to the callback are, in order, the share identifier, the
maximum share that will be generated, and the share contents and
length. The buffer that contains the share data may be reused once the
callback function returns, so if the data must be retained, make a
copy. However in many applications the share will be immediately
written to a file or socket, in which case no copy is necessary.
Each share will be of equal length, specifically size / k bytes.

To decode a set of shares into the original input, call decode:

void decode(const std::map<size_t, const byte*>& shares,
            size_t share_size,
   std::function<void (size_t, size_t, const byte[], size_t)> out) const

The map of shares is a mapping from share identifier (the first
parameter to the encoding callback) to the contents of the share. It
is essential that the share identifiers and shares are associated
properly: otherwise the decoding will fail. As described above, each
share is of equal length, which is specified using the share_size
parameter.

The output callback for decoding has the same interface, and much the
same semantics, as for encoding. The second parameter, which sets the
maximum number of blocks, will be k instead of n, since there are k
original input blocks. Since each block is the same size, you can
compute the full size of the output (which will be k, the second
parameter, multiplied by the size of each subpart, which is the fourth
parameter). For example to reconstruct the input into a file, you
could seek back and forth writing each block as it becomes available.

For both encoding and decoding, you should not assume that the output
blocks will be provided to the callback in order. Currently this is
the case for encoding, but not for decoding, and later if
multithreaded operations or OpenMP is used to parellize the encoding
it is quite likely that shares will be provided out of order.

Future Work / Todos / Send Patches
========================================

 * Use threads or OpenMP
 * Investigate loop tiling and other matrix multiplication optimizations
 * Use a sliding window for the SSE2 multiplication
 * Add support for NEON, AVX2, AVX-512, ...
 * Streaming interface
 * Progressive decoding (is that even possible?)
 * Allow use of different polynomials

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 79

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗