NnvmNo description or website provided.
Stars: ✭ 1,639 (+651.83%)
Ck CaffeCollective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
Stars: ✭ 192 (-11.93%)
MixbenchA GPU benchmark tool for evaluating GPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL)
Stars: ✭ 130 (-40.37%)
AmgxDistributed multigrid linear solver library on GPU
Stars: ✭ 207 (-5.05%)
Kaldikaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+5015.14%)
PrimitivA Neural Network Toolkit.
Stars: ✭ 164 (-24.77%)
FcisFully Convolutional Instance-aware Semantic Segmentation
Stars: ✭ 1,563 (+616.97%)
Macos Egpu Cuda GuideSet up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU
Stars: ✭ 187 (-14.22%)
OnemkloneAPI Math Kernel Library (oneMKL) Interfaces
Stars: ✭ 122 (-44.04%)
MobulaopA Simple & Flexible Cross Framework Operators Toolkit
Stars: ✭ 161 (-26.15%)
Pyhpc BenchmarksA suite of benchmarks to test the sequential CPU and GPU performance of most popular high-performance libraries for Python.
Stars: ✭ 119 (-45.41%)
GenomeworksSDK for GPU accelerated genome assembly and analysis
Stars: ✭ 215 (-1.38%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (-45.87%)
ClojurecudaClojure library for CUDA development
Stars: ✭ 158 (-27.52%)
MtensorA C++ Cuda Tensor Lazy Computing Library
Stars: ✭ 115 (-47.25%)
Pytorch spnExtension package for spatial propagation network in pytorch.
Stars: ✭ 114 (-47.71%)
Futhark💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+652.75%)
Cumf alsCUDA Matrix Factorization Library with Alternating Least Square (ALS)
Stars: ✭ 154 (-29.36%)
Nvidia Gpu Tensor Core Accelerator Pytorch OpencvA complete machine vision container that includes Jupyter notebooks with built-in code hinting, Anaconda, CUDA-X, TensorRT inference accelerator for Tensor cores, CuPy (GPU drop in replacement for Numpy), PyTorch, TF2, Tensorboard, and OpenCV for accelerated workloads on NVIDIA Tensor cores and GPUs.
Stars: ✭ 110 (-49.54%)
Ssd Gpu DmaBuild userspace NVMe drivers and storage applications with CUDA support
Stars: ✭ 172 (-21.1%)
CompactcnncascadeA binary library for very fast face detection using compact CNNs.
Stars: ✭ 152 (-30.28%)
DaceDaCe - Data Centric Parallel Programming
Stars: ✭ 106 (-51.38%)
NicehashquickminerSuper simple & easy Windows 10 cryptocurrency miner made by NiceHash.
Stars: ✭ 211 (-3.21%)
Cuda WinogradFast CUDA Kernels for ResNet Inference.
Stars: ✭ 104 (-52.29%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (-54.59%)
CreepminerBurstcoin C++ CPU and GPU Miner
Stars: ✭ 169 (-22.48%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-55.05%)
Cuda CnnCNN accelerated by cuda. Test on mnist and finilly get 99.76%
Stars: ✭ 148 (-32.11%)
Libgdf[ARCHIVED] C GPU DataFrame Library
Stars: ✭ 142 (-34.86%)
MinhashcudaWeighted MinHash implementation on CUDA (multi-gpu).
Stars: ✭ 88 (-59.63%)
SimplegpuhashtableA simple GPU hash table implemented in CUDA using lock free techniques
Stars: ✭ 198 (-9.17%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-57.8%)
Optical Flow FilterA real time optical flow algorithm implemented on GPU
Stars: ✭ 146 (-33.03%)
Cuda programmingCode from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch
Stars: ✭ 169 (-22.48%)
MatconvnetMatConvNet: CNNs for MATLAB
Stars: ✭ 1,299 (+495.87%)
GpurirPython library for Room Impulse Response (RIR) simulation with GPU acceleration
Stars: ✭ 145 (-33.49%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-58.72%)
BohriumAutomatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX
Stars: ✭ 209 (-4.13%)
Hoomd BlueMolecular dynamics and Monte Carlo soft matter simulation on GPUs.
Stars: ✭ 143 (-34.4%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+488.07%)
Deformable KernelsDeforming kernels to adapt towards object deformation. In ICLR 2020.
Stars: ✭ 166 (-23.85%)
ForwardA library for high performance deep learning inference on NVIDIA GPUs.
Stars: ✭ 136 (-37.61%)
Python Opencv Cudacustom opencv_contrib module which exposes opencv cuda optical flow methods with python bindings
Stars: ✭ 86 (-60.55%)
Knn cudapytorch knn [cuda version]
Stars: ✭ 86 (-60.55%)
ViseronSelf-hosted NVR with object detection
Stars: ✭ 192 (-11.93%)
FloorA C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.
Stars: ✭ 166 (-23.85%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (-35.78%)
MprReference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
Stars: ✭ 84 (-61.47%)
Pytorch EmdlossPyTorch 1.0 implementation of the approximate Earth Mover's Distance
Stars: ✭ 82 (-62.39%)
QudaQUDA is a library for performing calculations in lattice QCD on GPUs.
Stars: ✭ 166 (-23.85%)