NumerNumeric Erlang - vector and matrix operations with CUDA. Heavily inspired by Pteracuda - https://github.com/kevsmith/pteracuda
Stars: ✭ 91 (-32.59%)
Adacof PytorchOfficial source code for our paper "AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation" (CVPR 2020)
Stars: ✭ 110 (-18.52%)
SupraSUPRA: Software Defined Ultrasound Processing for Real-Time Applications - An Open Source 2D and 3D Pipeline from Beamforming to B-Mode
Stars: ✭ 96 (-28.89%)
Modulated Deform Convdeformable convolution 2D 3D DeformableConvolution DeformConv Modulated Pytorch CUDA
Stars: ✭ 81 (-40%)
CltuneCLTune: An automatic OpenCL & CUDA kernel tuner
Stars: ✭ 114 (-15.56%)
AuroraMinimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.
Stars: ✭ 90 (-33.33%)
FcisFully Convolutional Instance-aware Semantic Segmentation
Stars: ✭ 1,563 (+1057.78%)
Python Opencv Cudacustom opencv_contrib module which exposes opencv cuda optical flow methods with python bindings
Stars: ✭ 86 (-36.3%)
HashcatWorld's fastest and most advanced password recovery utility
Stars: ✭ 11,014 (+8058.52%)
DppDetail-Preserving Pooling in Deep Networks (CVPR 2018)
Stars: ✭ 99 (-26.67%)
HiopHPC solver for nonlinear optimization problems
Stars: ✭ 75 (-44.44%)
SpocStream Processing with OCaml
Stars: ✭ 115 (-14.81%)
Region ConvNot All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
Stars: ✭ 95 (-29.63%)
Kaldikaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+8160%)
ElasticfusionReal-time dense visual SLAM system
Stars: ✭ 1,298 (+861.48%)
Tensorflow Object Detection TutorialThe purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
Stars: ✭ 113 (-16.3%)
HallocA fast and highly scalable GPU dynamic memory allocator
Stars: ✭ 89 (-34.07%)
MixbenchA GPU benchmark tool for evaluating GPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL)
Stars: ✭ 130 (-3.7%)
MinhashcudaWeighted MinHash implementation on CUDA (multi-gpu).
Stars: ✭ 88 (-34.81%)
CuheCUDA Homomorphic Encryption Library
Stars: ✭ 109 (-19.26%)
MprReference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
Stars: ✭ 84 (-37.78%)
OnemkloneAPI Math Kernel Library (oneMKL) Interfaces
Stars: ✭ 122 (-9.63%)
2016 super resolutionICCV2015 Image Super-Resolution Using Deep Convolutional Networks
Stars: ✭ 78 (-42.22%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (-26.67%)
Cudart.jlJulia wrapper for CUDA runtime API
Stars: ✭ 75 (-44.44%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (-12.59%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-27.41%)
PynvvlA Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-29.63%)
MtensorA C++ Cuda Tensor Lazy Computing Library
Stars: ✭ 115 (-14.81%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-31.85%)
LibcudacxxThe C++ Standard Library for your entire system.
Stars: ✭ 1,861 (+1278.52%)
Pytorch spnExtension package for spatial propagation network in pytorch.
Stars: ✭ 114 (-15.56%)
MatconvnetMatConvNet: CNNs for MATLAB
Stars: ✭ 1,299 (+862.22%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-33.33%)
Pytorch Unflow a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version
Stars: ✭ 113 (-16.3%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+849.63%)
Futhark💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+1115.56%)
Warp RnntCUDA-Warp RNN-Transducer
Stars: ✭ 122 (-9.63%)
Knn cudapytorch knn [cuda version]
Stars: ✭ 86 (-36.3%)
Pytorch EmdlossPyTorch 1.0 implementation of the approximate Earth Mover's Distance
Stars: ✭ 82 (-39.26%)
AgencyExecution primitives for C++
Stars: ✭ 127 (-5.93%)
Nnabla Ext CudaA CUDA Extension of Neural Network Libraries
Stars: ✭ 79 (-41.48%)
DaceDaCe - Data Centric Parallel Programming
Stars: ✭ 106 (-21.48%)
Cuda Design PatternsSome CUDA design patterns and a bit of template magic for CUDA
Stars: ✭ 78 (-42.22%)
BabelstreamSTREAM, for lots of devices written in many programming models
Stars: ✭ 121 (-10.37%)
Cuda WinogradFast CUDA Kernels for ResNet Inference.
Stars: ✭ 104 (-22.96%)
NnvmNo description or website provided.
Stars: ✭ 1,639 (+1114.07%)
Knn cudaFast K-Nearest Neighbor search with GPU
Stars: ✭ 119 (-11.85%)
PygraphistryPyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Stars: ✭ 1,365 (+911.11%)