RmmRAPIDS Memory Manager
Stars: ✭ 154 (-8.88%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (-41.42%)
Marian DevFast Neural Machine Translation in C++ - development repository
Stars: ✭ 136 (-19.53%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-42.01%)
PynvvlA Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-43.79%)
Partial Order PruningPartial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Stars: ✭ 135 (-20.12%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-45.56%)
DsmnetDomain-invariant Stereo Matching Networks
Stars: ✭ 153 (-9.47%)
MatconvnetMatConvNet: CNNs for MATLAB
Stars: ✭ 1,299 (+668.64%)
QudaQUDA is a library for performing calculations in lattice QCD on GPUs.
Stars: ✭ 166 (-1.78%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-46.75%)
LibcudacxxThe C++ Standard Library for your entire system.
Stars: ✭ 1,861 (+1001.18%)
JetsonHelmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
Stars: ✭ 151 (-10.65%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+658.58%)
AgencyExecution primitives for C++
Stars: ✭ 127 (-24.85%)
PrimitivA Neural Network Toolkit.
Stars: ✭ 164 (-2.96%)
Knn cudapytorch knn [cuda version]
Stars: ✭ 86 (-49.11%)
Pytorch EmdlossPyTorch 1.0 implementation of the approximate Earth Mover's Distance
Stars: ✭ 82 (-51.48%)
GinkgoNumerical linear algebra software package
Stars: ✭ 149 (-11.83%)
Nnabla Ext CudaA CUDA Extension of Neural Network Libraries
Stars: ✭ 79 (-53.25%)
Cuda Design PatternsSome CUDA design patterns and a bit of template magic for CUDA
Stars: ✭ 78 (-53.85%)
Deformable KernelsDeforming kernels to adapt towards object deformation. In ICLR 2020.
Stars: ✭ 166 (-1.78%)
Cudart.jlJulia wrapper for CUDA runtime API
Stars: ✭ 75 (-55.62%)
Warp RnntCUDA-Warp RNN-Transducer
Stars: ✭ 122 (-27.81%)
ParenchymaAn extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (-57.99%)
SketchgraphsA dataset of 15 million CAD sketches with geometric constraint graphs.
Stars: ✭ 148 (-12.43%)
DeepjointfilterThe source code of ECCV16 'Deep Joint Image Filtering'.
Stars: ✭ 68 (-59.76%)
BabelstreamSTREAM, for lots of devices written in many programming models
Stars: ✭ 121 (-28.4%)
AlenkaGPU database engine
Stars: ✭ 1,150 (+580.47%)
KhivaAn open-source library of algorithms to analyse time series in GPU and CPU.
Stars: ✭ 161 (-4.73%)
Autodock GpuAutoDock for GPUs and other accelerators
Stars: ✭ 65 (-61.54%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (-30.18%)
Cudadrv.jlA Julia wrapper for the CUDA driver API.
Stars: ✭ 64 (-62.13%)
Mpn Cov@ICCV2017: For exploiting second-order statistics, we propose Matrix Power Normalized Covariance pooling (MPN-COV) ConvNets, different from and outperforming those using global average pooling.
Stars: ✭ 63 (-62.72%)
MtensorA C++ Cuda Tensor Lazy Computing Library
Stars: ✭ 115 (-31.95%)
GgnnGGNN: State of the Art Graph-based GPU Nearest Neighbor Search
Stars: ✭ 63 (-62.72%)
Gdax Orderbook MlApplication of machine learning to the Coinbase (GDAX) orderbook
Stars: ✭ 60 (-64.5%)
Pytorch spnExtension package for spatial propagation network in pytorch.
Stars: ✭ 114 (-32.54%)
PycudaCUDA integration for Python, plus shiny features
Stars: ✭ 1,112 (+557.99%)
RemoterySingle C file, Realtime CPU/GPU Profiler with Remote Web Viewer
Stars: ✭ 1,908 (+1028.99%)
Pytorch Unflow a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version
Stars: ✭ 113 (-33.14%)
Flattened CnnFlattened convolutional neural networks (1D convolution modules for Torch nn)
Stars: ✭ 59 (-65.09%)
Xmrminer🐜 A CUDA based miner for Monero
Stars: ✭ 158 (-6.51%)
Futhark💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+871.01%)
DragonDragon: A Computation Graph Virtual Machine Based Deep Learning Framework.
Stars: ✭ 168 (-0.59%)
FloorA C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.
Stars: ✭ 166 (-1.78%)
JcudaJCuda - Java bindings for CUDA
Stars: ✭ 165 (-2.37%)
ForwardA library for high performance deep learning inference on NVIDIA GPUs.
Stars: ✭ 136 (-19.53%)
HashcatWorld's fastest and most advanced password recovery utility
Stars: ✭ 11,014 (+6417.16%)