CreepminerBurstcoin C++ CPU and GPU Miner
Stars: ✭ 169 (-18.36%)
BabelstreamSTREAM, for lots of devices written in many programming models
Stars: ✭ 121 (-41.55%)
RmmRAPIDS Memory Manager
Stars: ✭ 154 (-25.6%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (-43%)
Ck CaffeCollective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
Stars: ✭ 192 (-7.25%)
MtensorA C++ Cuda Tensor Lazy Computing Library
Stars: ✭ 115 (-44.44%)
DsmnetDomain-invariant Stereo Matching Networks
Stars: ✭ 153 (-26.09%)
Pytorch spnExtension package for spatial propagation network in pytorch.
Stars: ✭ 114 (-44.93%)
Cuda programmingCode from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch
Stars: ✭ 169 (-18.36%)
Pytorch Unflow a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version
Stars: ✭ 113 (-45.41%)
JetsonHelmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
Stars: ✭ 151 (-27.05%)
Futhark💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+692.75%)
SimplegpuhashtableA simple GPU hash table implemented in CUDA using lock free techniques
Stars: ✭ 198 (-4.35%)
GinkgoNumerical linear algebra software package
Stars: ✭ 149 (-28.02%)
DaceDaCe - Data Centric Parallel Programming
Stars: ✭ 106 (-48.79%)
Deformable KernelsDeforming kernels to adapt towards object deformation. In ICLR 2020.
Stars: ✭ 166 (-19.81%)
Cuda WinogradFast CUDA Kernels for ResNet Inference.
Stars: ✭ 104 (-49.76%)
SketchgraphsA dataset of 15 million CAD sketches with geometric constraint graphs.
Stars: ✭ 148 (-28.5%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (-52.17%)
Macos Egpu Cuda GuideSet up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU
Stars: ✭ 187 (-9.66%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-52.66%)
PynvvlA Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-54.11%)
QudaQUDA is a library for performing calculations in lattice QCD on GPUs.
Stars: ✭ 166 (-19.81%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-55.56%)
RemoterySingle C file, Realtime CPU/GPU Profiler with Remote Web Viewer
Stars: ✭ 1,908 (+821.74%)
OneflowOneFlow is a performance-centered and open-source deep learning framework.
Stars: ✭ 2,868 (+1285.51%)
MatconvnetMatConvNet: CNNs for MATLAB
Stars: ✭ 1,299 (+527.54%)
Libgdf[ARCHIVED] C GPU DataFrame Library
Stars: ✭ 142 (-31.4%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-56.52%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (-32.37%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+519.32%)
Marian DevFast Neural Machine Translation in C++ - development repository
Stars: ✭ 136 (-34.3%)
Knn cudapytorch knn [cuda version]
Stars: ✭ 86 (-58.45%)
Pytorch EmdlossPyTorch 1.0 implementation of the approximate Earth Mover's Distance
Stars: ✭ 82 (-60.39%)
Partial Order PruningPartial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Stars: ✭ 135 (-34.78%)
Nnabla Ext CudaA CUDA Extension of Neural Network Libraries
Stars: ✭ 79 (-61.84%)
ViseronSelf-hosted NVR with object detection
Stars: ✭ 192 (-7.25%)
Cuda Design PatternsSome CUDA design patterns and a bit of template magic for CUDA
Stars: ✭ 78 (-62.32%)
Cudart.jlJulia wrapper for CUDA runtime API
Stars: ✭ 75 (-63.77%)
PrimitivA Neural Network Toolkit.
Stars: ✭ 164 (-20.77%)
ParenchymaAn extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (-65.7%)
LibcudacxxThe C++ Standard Library for your entire system.
Stars: ✭ 1,861 (+799.03%)
DeepjointfilterThe source code of ECCV16 'Deep Joint Image Filtering'.
Stars: ✭ 68 (-67.15%)
Ssd Gpu DmaBuild userspace NVMe drivers and storage applications with CUDA support
Stars: ✭ 172 (-16.91%)
AgencyExecution primitives for C++
Stars: ✭ 127 (-38.65%)
Cunn Stars: ✭ 205 (-0.97%)
Pine🌲 Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.
Stars: ✭ 202 (-2.42%)
TimemoryModular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
Stars: ✭ 192 (-7.25%)
Gmonitorgmonitor is a GPU monitor (Nvidia only at the moment)
Stars: ✭ 169 (-18.36%)
ClojurecudaClojure library for CUDA development
Stars: ✭ 158 (-23.67%)
Kaldikaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+5286.96%)