EmuThe write-once-run-anywhere GPGPU library for Rust
Stars: ✭ 1,350 (+907.46%)
SpocStream Processing with OCaml
Stars: ✭ 115 (-14.18%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-26.87%)
Pp4fpgas Cn HlsHLS Project of pp4fpgas - https://github.com/xupsh/pp4fpgas-cn
Stars: ✭ 97 (-27.61%)
CltuneCLTune: An automatic OpenCL & CUDA kernel tuner
Stars: ✭ 114 (-14.93%)
XchainA cross compiler toolchain targeting macOS/iOS/etc.
Stars: ✭ 95 (-29.1%)
Coreparallel finite element unstructured meshes
Stars: ✭ 124 (-7.46%)
Region ConvNot All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
Stars: ✭ 95 (-29.1%)
HikariLLVM Obfuscator
Stars: ✭ 1,585 (+1082.84%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-31.34%)
NumerNumeric Erlang - vector and matrix operations with CUDA. Heavily inspired by Pteracuda - https://github.com/kevsmith/pteracuda
Stars: ✭ 91 (-32.09%)
Pytorch Unflow a reimplementation of UnFlow in PyTorch that matches the official TensorFlow version
Stars: ✭ 113 (-15.67%)
Llvm UtilsLLVM/Clang for Visual Studio 2019, 2017, 2015, 2013, 2012 and 2010. clang-cl for Python3 distutils. Utils for Clang Static Analyzer
Stars: ✭ 123 (-8.21%)
BrainAn esoteric programming language compiler on top of LLVM based on Brainfuck
Stars: ✭ 112 (-16.42%)
MatconvnetMatConvNet: CNNs for MATLAB
Stars: ✭ 1,299 (+869.4%)
AuroraMinimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.
Stars: ✭ 90 (-32.84%)
Enzyme.jlJulia bindings for the Enzyme automatic differentiator
Stars: ✭ 90 (-32.84%)
Llvm MirrorNOTE: The LLVM project now operates official Git mirrors as well: http://llvm.org/docs/GettingStarted.html#git-mirror -- An automated mirror of llvm/trunk from LLVM's SVN. Updates hourly. Release branches and tags are tracked manually. This mirror is *not* commit-ID compatible with the official Git mirrors.
Stars: ✭ 122 (-8.96%)
Opcde2017Slides and very basic examples
Stars: ✭ 112 (-16.42%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-32.84%)
Adacof PytorchOfficial source code for our paper "AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation" (CVPR 2020)
Stars: ✭ 110 (-17.91%)
HallocA fast and highly scalable GPU dynamic memory allocator
Stars: ✭ 89 (-33.58%)
LibebcC++ Library and Tool for Extracting Embedded Bitcode
Stars: ✭ 122 (-8.96%)
Futhark💥💻💥 A data-parallel functional programming language
Stars: ✭ 1,641 (+1124.63%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+856.72%)
CuheCUDA Homomorphic Encryption Library
Stars: ✭ 109 (-18.66%)
MinhashcudaWeighted MinHash implementation on CUDA (multi-gpu).
Stars: ✭ 88 (-34.33%)
PysnnEfficient Spiking Neural Network framework, built on top of PyTorch for GPU acceleration
Stars: ✭ 129 (-3.73%)
GllvmWhole Program LLVM: wllvm ported to go
Stars: ✭ 126 (-5.97%)
Llvm Pass TutorialA step-by-step tutorial for building an LLVM sample pass
Stars: ✭ 122 (-8.96%)
Tapir LlvmTapir extension to LLVM for optimizing Parallel Programs
Stars: ✭ 109 (-18.66%)
GhdlVHDL 2008/93/87 simulator
Stars: ✭ 1,285 (+858.96%)
Biglassobiglasso: Extending Lasso Model Fitting to Big Data in R
Stars: ✭ 87 (-35.07%)
Bin2llvmA binary to LLVM translator
Stars: ✭ 108 (-19.4%)
Python Opencv Cudacustom opencv_contrib module which exposes opencv cuda optical flow methods with python bindings
Stars: ✭ 86 (-35.82%)
DtcraftA High-performance Cluster Computing Engine
Stars: ✭ 122 (-8.96%)
Knn cudapytorch knn [cuda version]
Stars: ✭ 86 (-35.82%)
MalcMal (Make A Lisp) compiler
Stars: ✭ 85 (-36.57%)
HashcatWorld's fastest and most advanced password recovery utility
Stars: ✭ 11,014 (+8119.4%)
Lsh deeplearningScalable and Sustainable Deep Learning via Randomized Hashing
Stars: ✭ 85 (-36.57%)
Proton ClangProton Clang toolchains builds in the form of a continuously updating Git repository. Clone with --depth=1.
Stars: ✭ 126 (-5.97%)
Warp RnntCUDA-Warp RNN-Transducer
Stars: ✭ 122 (-8.96%)
DaceDaCe - Data Centric Parallel Programming
Stars: ✭ 106 (-20.9%)
MprReference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
Stars: ✭ 84 (-37.31%)
SchwimmbadA common interface to processing pools.
Stars: ✭ 82 (-38.81%)
Pytorch EmdlossPyTorch 1.0 implementation of the approximate Earth Mover's Distance
Stars: ✭ 82 (-38.81%)
LikelyA compiler intermediate representation for image recognition and heterogeneous computing.
Stars: ✭ 81 (-39.55%)