All Projects → Nsimd → Similar Projects or Alternatives

892 Open source projects that are alternatives of or similar to Nsimd

sliceslice-rs

A fast implementation of single-pattern substring search using SIMD acceleration.

Stars: ✭ 66 (-52.17%)

Mutual labels: simd, avx2

penguinV

Simple and fast C++ image processing library with focus on heterogeneous systems

Stars: ✭ 110 (-20.29%)

Mutual labels: avx, simd

Chromium Clang

Chromium browser compiled with the Clang/LLVM compiler.

Stars: ✭ 77 (-44.2%)

Mutual labels: avx, avx2

dbcsr

DBCSR: Distributed Block Compressed Sparse Row matrix library

Stars: ✭ 65 (-52.9%)

Mutual labels: hpc, cuda

HiSpatialCluster

Clustering spatial points with algorithm of Fast Search, high performace computing implements of CUDA or parallel in CPU, and runnable implements on python standalone or arcgis.

Stars: ✭ 31 (-77.54%)

Mutual labels: hpc, cuda

Occa

JIT Compilation for Multiple Architectures: C++, OpenMP, CUDA, HIP, OpenCL, Metal

Stars: ✭ 230 (+66.67%)

Mutual labels: cuda, hpc

lsp-dsp-lib

DSP library for signal processing

Stars: ✭ 37 (-73.19%)

Mutual labels: simd, aarch64

sse-avx-rasterization

Triangle rasterization routines accelerated by SSE and AVX

Stars: ✭ 53 (-61.59%)

Mutual labels: avx, simd

SoftLight

A shader-based Software Renderer Using The LightSky Framework.

Stars: ✭ 2 (-98.55%)

Mutual labels: neon, simd

peakperf

Achieve peak performance on x86 CPUs and NVIDIA GPUs

Stars: ✭ 33 (-76.09%)

Mutual labels: cuda, avx

Turbo-Histogram

Fastest Histogram Construction

Stars: ✭ 44 (-68.12%)

Mutual labels: simd, avx2

gpubootcamp

This repository consists for gpu bootcamp material for HPC and AI

Stars: ✭ 227 (+64.49%)

Mutual labels: hpc, cuda

block-aligner

SIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.

Stars: ✭ 58 (-57.97%)

Mutual labels: simd, avx2

monolish

monolish: MONOlithic LInear equation Solvers for Highly-parallel architecture

Stars: ✭ 166 (+20.29%)

Mutual labels: hpc, cuda

MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

Stars: ✭ 418 (+202.9%)

Mutual labels: hpc, cuda

Futhark

💥💻💥 A data-parallel functional programming language

Stars: ✭ 1,641 (+1089.13%)

Mutual labels: cuda, hpc

Fastor

A lightweight high performance tensor algebra framework for modern C++

Stars: ✭ 280 (+102.9%)

Mutual labels: simd, hpc

Fastbase64

SIMD-accelerated base64 codecs

Stars: ✭ 309 (+123.91%)

Mutual labels: simd, avx2

Packettracer

The SIMD-accelereted ray tracing in C# powered by Intel hardware intrinsic of .NET Core.

Stars: ✭ 109 (-21.01%)

Mutual labels: simd, avx

Visionaray

A C++-based, cross platform ray tracing library

Stars: ✭ 342 (+147.83%)

Mutual labels: simd, cuda

awesome-simd

A curated list of awesome SIMD frameworks, libraries and software

Stars: ✭ 39 (-71.74%)

Mutual labels: simd, avx2

Hiop

HPC solver for nonlinear optimization problems

Stars: ✭ 75 (-45.65%)

Mutual labels: cuda, hpc

Arrayfire Python

Python bindings for ArrayFire: A general purpose GPU library.

Stars: ✭ 358 (+159.42%)

Mutual labels: cuda, hpc

Asm Dude

Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window

Stars: ✭ 3,898 (+2724.64%)

Mutual labels: avx2, avx512

Turbo-Transpose

Transpose: SIMD Integer+Floating Point Compression Filter

Stars: ✭ 50 (-63.77%)

Mutual labels: simd, avx2

Arrayfire Rust

Rust wrapper for ArrayFire

Stars: ✭ 525 (+280.43%)

Mutual labels: cuda, hpc

Sha256 Simd

Accelerate SHA256 computations in pure Go using Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.

Stars: ✭ 657 (+376.09%)

Mutual labels: avx512, avx

Onemkl

oneAPI Math Kernel Library (oneMKL) Interfaces

Stars: ✭ 122 (-11.59%)

Mutual labels: cuda, hpc

Parenchyma

An extensible HPC framework for CUDA, OpenCL and native CPU.

Stars: ✭ 71 (-48.55%)

Mutual labels: cuda, hpc

allgebra

Base container for developing C++ and Fortran HPC applications

Stars: ✭ 14 (-89.86%)

Mutual labels: hpc, cuda

simdjson-rs

Rust version of lemire's SimdJson

Stars: ✭ 18 (-86.96%)

Mutual labels: simd, avx2

Simdjsonsharp

C# bindings for lemire/simdjson (and full C# port)

Stars: ✭ 506 (+266.67%)

Mutual labels: simd, avx2

Highwayhash

Native Go version of HighwayHash with optimized assembly implementations on Intel and ARM. Able to process over 10 GB/sec on a single core on Intel CPUs - https://en.wikipedia.org/wiki/HighwayHash

Stars: ✭ 670 (+385.51%)

Mutual labels: neon, avx2

Ktt

Kernel Tuning Toolkit

Stars: ✭ 33 (-76.09%)

Mutual labels: cuda, hpc

Sixtyfour

How fast can we brute force a 64-bit comparison?

Stars: ✭ 41 (-70.29%)

Mutual labels: cuda, avx2

Deepnet

Deep.Net machine learning framework for F#

Stars: ✭ 99 (-28.26%)

Mutual labels: cuda

Knn cuda

Fast K-Nearest Neighbor search with GPU

Stars: ✭ 119 (-13.77%)

Mutual labels: cuda

Dpp

Detail-Preserving Pooling in Deep Networks (CVPR 2018)

Stars: ✭ 99 (-28.26%)

Mutual labels: cuda

Singularity Cri

The Singularity implementation of the Kubernetes Container Runtime Interface

Stars: ✭ 97 (-29.71%)

Mutual labels: hpc

Fastapprox

Approximate and vectorized versions of common mathematical functions

Stars: ✭ 128 (-7.25%)

Mutual labels: simd

Docker Homebridge

Homebridge Docker. HomeKit support for the impatient using Docker on x86_64, Raspberry Pi (armhf) and ARM64. Includes ffmpeg + libfdk-aac.

Stars: ✭ 1,847 (+1238.41%)

Mutual labels: aarch64

Extending Jax

Extending JAX with custom C++ and CUDA code

Stars: ✭ 98 (-28.99%)

Mutual labels: cuda

Qreverse

A small study in hardware accelerated AoS reversal

Stars: ✭ 97 (-29.71%)

Mutual labels: simd

Nnpack

Acceleration package for neural networks on multi-core CPUs

Stars: ✭ 1,538 (+1014.49%)

Mutual labels: simd

Supra

SUPRA: Software Defined Ultrasound Processing for Real-Time Applications - An Open Source 2D and 3D Pipeline from Beamforming to B-Mode

Stars: ✭ 96 (-30.43%)

Mutual labels: cuda

Accelerate Llvm

LLVM backend for Accelerate

Stars: ✭ 134 (-2.9%)

Mutual labels: cuda

Jevois

JeVois smart machine vision framework

Stars: ✭ 128 (-7.25%)

Mutual labels: neon

Sketch

C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings

Stars: ✭ 96 (-30.43%)

Mutual labels: simd

Nextflow

A DSL for data-driven computational pipelines

Stars: ✭ 1,337 (+868.84%)

Mutual labels: hpc

Charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.

Stars: ✭ 96 (-30.43%)

Mutual labels: hpc

Thorin

The Higher-Order Intermediate Representation

Stars: ✭ 116 (-15.94%)

Mutual labels: simd

Pynvvl

A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python

Stars: ✭ 95 (-31.16%)

Mutual labels: cuda

Region Conv

Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade

Stars: ✭ 95 (-31.16%)

Mutual labels: cuda

Professional Cuda C Programming

Stars: ✭ 127 (-7.97%)

Mutual labels: cuda

Spoc

Stream Processing with OCaml

Stars: ✭ 115 (-16.67%)

Mutual labels: cuda

Fbtt Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.

Stars: ✭ 92 (-33.33%)

Mutual labels: cuda

Off

OFF, Open source Finite volume Fluid dynamics code

Stars: ✭ 93 (-32.61%)

Mutual labels: hpc

Amplifier.net

Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.

Stars: ✭ 92 (-33.33%)

Mutual labels: simd

Blt

A streamlined CMake build system foundation for developing HPC software

Stars: ✭ 135 (-2.17%)

Mutual labels: hpc

Jpeg Quantsmooth

JPEG artifacts removal based on quantization coefficients.

Stars: ✭ 134 (-2.9%)

Mutual labels: simd

61-120 of 892 similar projects

‹

›

next*5