All Projects → Nsimd → Similar Projects or Alternatives

892 Open source projects that are alternatives of or similar to Nsimd

sliceslice-rs
A fast implementation of single-pattern substring search using SIMD acceleration.
Stars: ✭ 66 (-52.17%)
Mutual labels:  simd, avx2
penguinV
Simple and fast C++ image processing library with focus on heterogeneous systems
Stars: ✭ 110 (-20.29%)
Mutual labels:  avx, simd
Chromium Clang
Chromium browser compiled with the Clang/LLVM compiler.
Stars: ✭ 77 (-44.2%)
Mutual labels:  avx, avx2
dbcsr
DBCSR: Distributed Block Compressed Sparse Row matrix library
Stars: ✭ 65 (-52.9%)
Mutual labels:  hpc, cuda
HiSpatialCluster
Clustering spatial points with algorithm of Fast Search, high performace computing implements of CUDA or parallel in CPU, and runnable implements on python standalone or arcgis.
Stars: ✭ 31 (-77.54%)
Mutual labels:  hpc, cuda
Occa
JIT Compilation for Multiple Architectures: C++, OpenMP, CUDA, HIP, OpenCL, Metal
Stars: ✭ 230 (+66.67%)
Mutual labels:  cuda, hpc
lsp-dsp-lib
DSP library for signal processing
Stars: ✭ 37 (-73.19%)
Mutual labels:  simd, aarch64
sse-avx-rasterization
Triangle rasterization routines accelerated by SSE and AVX
Stars: ✭ 53 (-61.59%)
Mutual labels:  avx, simd
SoftLight
A shader-based Software Renderer Using The LightSky Framework.
Stars: ✭ 2 (-98.55%)
Mutual labels:  neon, simd
peakperf
Achieve peak performance on x86 CPUs and NVIDIA GPUs
Stars: ✭ 33 (-76.09%)
Mutual labels:  cuda, avx
Turbo-Histogram
Fastest Histogram Construction
Stars: ✭ 44 (-68.12%)
Mutual labels:  simd, avx2
gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
Stars: ✭ 227 (+64.49%)
Mutual labels:  hpc, cuda
block-aligner
SIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.
Stars: ✭ 58 (-57.97%)
Mutual labels:  simd, avx2
monolish
monolish: MONOlithic LInear equation Solvers for Highly-parallel architecture
Stars: ✭ 166 (+20.29%)
Mutual labels:  hpc, cuda
MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
Stars: ✭ 418 (+202.9%)
Mutual labels:  hpc, cuda
Futhark
💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+1089.13%)
Mutual labels:  cuda, hpc
Fastor
A lightweight high performance tensor algebra framework for modern C++
Stars: ✭ 280 (+102.9%)
Mutual labels:  simd, hpc
Fastbase64
SIMD-accelerated base64 codecs
Stars: ✭ 309 (+123.91%)
Mutual labels:  simd, avx2
Packettracer
The SIMD-accelereted ray tracing in C# powered by Intel hardware intrinsic of .NET Core.
Stars: ✭ 109 (-21.01%)
Mutual labels:  simd, avx
Visionaray
A C++-based, cross platform ray tracing library
Stars: ✭ 342 (+147.83%)
Mutual labels:  simd, cuda
awesome-simd
A curated list of awesome SIMD frameworks, libraries and software
Stars: ✭ 39 (-71.74%)
Mutual labels:  simd, avx2
Hiop
HPC solver for nonlinear optimization problems
Stars: ✭ 75 (-45.65%)
Mutual labels:  cuda, hpc
Arrayfire Python
Python bindings for ArrayFire: A general purpose GPU library.
Stars: ✭ 358 (+159.42%)
Mutual labels:  cuda, hpc
Asm Dude
Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window
Stars: ✭ 3,898 (+2724.64%)
Mutual labels:  avx2, avx512
Turbo-Transpose
Transpose: SIMD Integer+Floating Point Compression Filter
Stars: ✭ 50 (-63.77%)
Mutual labels:  simd, avx2
Arrayfire Rust
Rust wrapper for ArrayFire
Stars: ✭ 525 (+280.43%)
Mutual labels:  cuda, hpc
Sha256 Simd
Accelerate SHA256 computations in pure Go using Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
Stars: ✭ 657 (+376.09%)
Mutual labels:  avx512, avx
Onemkl
oneAPI Math Kernel Library (oneMKL) Interfaces
Stars: ✭ 122 (-11.59%)
Mutual labels:  cuda, hpc
Parenchyma
An extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (-48.55%)
Mutual labels:  cuda, hpc
allgebra
Base container for developing C++ and Fortran HPC applications
Stars: ✭ 14 (-89.86%)
Mutual labels:  hpc, cuda
simdjson-rs
Rust version of lemire's SimdJson
Stars: ✭ 18 (-86.96%)
Mutual labels:  simd, avx2
Simdjsonsharp
C# bindings for lemire/simdjson (and full C# port)
Stars: ✭ 506 (+266.67%)
Mutual labels:  simd, avx2
Highwayhash
Native Go version of HighwayHash with optimized assembly implementations on Intel and ARM. Able to process over 10 GB/sec on a single core on Intel CPUs - https://en.wikipedia.org/wiki/HighwayHash
Stars: ✭ 670 (+385.51%)
Mutual labels:  neon, avx2
Ktt
Kernel Tuning Toolkit
Stars: ✭ 33 (-76.09%)
Mutual labels:  cuda, hpc
Sixtyfour
How fast can we brute force a 64-bit comparison?
Stars: ✭ 41 (-70.29%)
Mutual labels:  cuda, avx2
Deepnet
Deep.Net machine learning framework for F#
Stars: ✭ 99 (-28.26%)
Mutual labels:  cuda
Knn cuda
Fast K-Nearest Neighbor search with GPU
Stars: ✭ 119 (-13.77%)
Mutual labels:  cuda
Dpp
Detail-Preserving Pooling in Deep Networks (CVPR 2018)
Stars: ✭ 99 (-28.26%)
Mutual labels:  cuda
Singularity Cri
The Singularity implementation of the Kubernetes Container Runtime Interface
Stars: ✭ 97 (-29.71%)
Mutual labels:  hpc
Fastapprox
Approximate and vectorized versions of common mathematical functions
Stars: ✭ 128 (-7.25%)
Mutual labels:  simd
Docker Homebridge
Homebridge Docker. HomeKit support for the impatient using Docker on x86_64, Raspberry Pi (armhf) and ARM64. Includes ffmpeg + libfdk-aac.
Stars: ✭ 1,847 (+1238.41%)
Mutual labels:  aarch64
Extending Jax
Extending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-28.99%)
Mutual labels:  cuda
Qreverse
A small study in hardware accelerated AoS reversal
Stars: ✭ 97 (-29.71%)
Mutual labels:  simd
Nnpack
Acceleration package for neural networks on multi-core CPUs
Stars: ✭ 1,538 (+1014.49%)
Mutual labels:  simd
Supra
SUPRA: Software Defined Ultrasound Processing for Real-Time Applications - An Open Source 2D and 3D Pipeline from Beamforming to B-Mode
Stars: ✭ 96 (-30.43%)
Mutual labels:  cuda
Accelerate Llvm
LLVM backend for Accelerate
Stars: ✭ 134 (-2.9%)
Mutual labels:  cuda
Jevois
JeVois smart machine vision framework
Stars: ✭ 128 (-7.25%)
Mutual labels:  neon
Sketch
C++ Implementations of sketch data structures with SIMD Parallelism, including Python bindings
Stars: ✭ 96 (-30.43%)
Mutual labels:  simd
Nextflow
A DSL for data-driven computational pipelines
Stars: ✭ 1,337 (+868.84%)
Mutual labels:  hpc
Charm
The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Stars: ✭ 96 (-30.43%)
Mutual labels:  hpc
Thorin
The Higher-Order Intermediate Representation
Stars: ✭ 116 (-15.94%)
Mutual labels:  simd
Pynvvl
A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-31.16%)
Mutual labels:  cuda
Region Conv
Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
Stars: ✭ 95 (-31.16%)
Mutual labels:  cuda
Professional Cuda C Programming
Stars: ✭ 127 (-7.97%)
Mutual labels:  cuda
Spoc
Stream Processing with OCaml
Stars: ✭ 115 (-16.67%)
Mutual labels:  cuda
Fbtt Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-33.33%)
Mutual labels:  cuda
Off
OFF, Open source Finite volume Fluid dynamics code
Stars: ✭ 93 (-32.61%)
Mutual labels:  hpc
Amplifier.net
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.
Stars: ✭ 92 (-33.33%)
Mutual labels:  simd
Blt
A streamlined CMake build system foundation for developing HPC software
Stars: ✭ 135 (-2.17%)
Mutual labels:  hpc
Jpeg Quantsmooth
JPEG artifacts removal based on quantization coefficients.
Stars: ✭ 134 (-2.9%)
Mutual labels:  simd
61-120 of 892 similar projects