BabelstreamSTREAM, for lots of devices written in many programming models
Stars: ✭ 121 (-10.37%)
Python Opencv Cudacustom opencv_contrib module which exposes opencv cuda optical flow methods with python bindings
Stars: ✭ 86 (-36.3%)
MprReference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
Stars: ✭ 84 (-37.78%)
Modulated Deform Convdeformable convolution 2D 3D DeformableConvolution DeformConv Modulated Pytorch CUDA
Stars: ✭ 81 (-40%)
MixbenchA GPU benchmark tool for evaluating GPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL)
Stars: ✭ 130 (-3.7%)
Kaldikaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+8160%)
SpocStream Processing with OCaml
Stars: ✭ 115 (-14.81%)
SupraSUPRA: Software Defined Ultrasound Processing for Real-Time Applications - An Open Source 2D and 3D Pipeline from Beamforming to B-Mode
Stars: ✭ 96 (-28.89%)
Gdax Orderbook MlApplication of machine learning to the Coinbase (GDAX) orderbook
Stars: ✭ 60 (-55.56%)
2016 super resolutionICCV2015 Image Super-Resolution Using Deep Convolutional Networks
Stars: ✭ 78 (-42.22%)
AspectA parallel, extensible finite element code to simulate convection in both 2D and 3D models.
Stars: ✭ 120 (-11.11%)
HiopHPC solver for nonlinear optimization problems
Stars: ✭ 75 (-44.44%)
ClustermqR package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
Stars: ✭ 106 (-21.48%)
Cudart.jlJulia wrapper for CUDA runtime API
Stars: ✭ 75 (-44.44%)
LibcudacxxThe C++ Standard Library for your entire system.
Stars: ✭ 1,861 (+1278.52%)
Mads.jlMADS: Model Analysis & Decision Support
Stars: ✭ 71 (-47.41%)
OpenclgaA Python Library for Genetic Algorithm on OpenCL
Stars: ✭ 103 (-23.7%)
Knn cudaFast K-Nearest Neighbor search with GPU
Stars: ✭ 119 (-11.85%)
Torch samplingEfficient reservoir sampling implementation for PyTorch
Stars: ✭ 68 (-49.63%)
ArboretumGradient Boosting powered by GPU(NVIDIA CUDA)
Stars: ✭ 64 (-52.59%)
EmuThe write-once-run-anywhere GPGPU library for Rust
Stars: ✭ 1,350 (+900%)
CudadtwGPU-Suite
Stars: ✭ 63 (-53.33%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (-12.59%)
CutlassCUDA Templates for Linear Algebra Subroutines
Stars: ✭ 1,123 (+731.85%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-27.41%)
Tsne CudaGPU Accelerated t-SNE for CUDA with Python bindings
Stars: ✭ 1,120 (+729.63%)
PynvvlA Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-29.63%)
MinkowskiengineMinkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
Stars: ✭ 1,110 (+722.22%)
MpmSimulating on GPU using Material Point Method and rendering.
Stars: ✭ 61 (-54.81%)
MtensorA C++ Cuda Tensor Lazy Computing Library
Stars: ✭ 115 (-14.81%)
Region ConvNot All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
Stars: ✭ 95 (-29.63%)
Sushi2Matrix Library for JavaScript
Stars: ✭ 60 (-55.56%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-31.85%)
Flattened CnnFlattened convolutional neural networks (1D convolution modules for Torch nn)
Stars: ✭ 59 (-56.3%)
MpmCB-Geo High-Performance Material Point Method
Stars: ✭ 115 (-14.81%)
BoincOpen-source software for volunteer computing and grid computing.
Stars: ✭ 1,320 (+877.78%)
DokaiCollection of Docker images for ML/DL and video processing projects
Stars: ✭ 58 (-57.04%)
NumerNumeric Erlang - vector and matrix operations with CUDA. Heavily inspired by Pteracuda - https://github.com/kevsmith/pteracuda
Stars: ✭ 91 (-32.59%)
GeopmGlobal Extensible Open Power Manager
Stars: ✭ 57 (-57.78%)
CltuneCLTune: An automatic OpenCL & CUDA kernel tuner
Stars: ✭ 114 (-15.56%)
Drake ExamplesExample workflows for the drake R package
Stars: ✭ 57 (-57.78%)
ElasticfusionReal-time dense visual SLAM system
Stars: ✭ 1,298 (+861.48%)
Cuda SamplesSamples for CUDA Developers which demonstrates features in CUDA Toolkit
Stars: ✭ 1,087 (+705.19%)
Nvbio GplNVBIO is a library of reusable components designed to accelerate bioinformatics applications using CUDA.
Stars: ✭ 56 (-58.52%)
PysnnEfficient Spiking Neural Network framework, built on top of PyTorch for GPU acceleration
Stars: ✭ 129 (-4.44%)
FcisFully Convolutional Instance-aware Semantic Segmentation
Stars: ✭ 1,563 (+1057.78%)
JlscaSide-channel toolkit in Julia
Stars: ✭ 114 (-15.56%)
DrakeAn R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+863.7%)
Hzproctorch data augmentation toolbox (supports affine transform)
Stars: ✭ 56 (-58.52%)
Dink点云深度学习框架 | Point cloud Deep learning Framework
Stars: ✭ 56 (-58.52%)