SimpleopenclsamplesSimple OpenCL Samples that Build with Khronos Headers and Libs
Stars: ✭ 22 (-80.87%)
VuhVulkan compute for people
Stars: ✭ 264 (+129.57%)
HallocA fast and highly scalable GPU dynamic memory allocator
Stars: ✭ 89 (-22.61%)
BrainsimulatorBrain Simulator is a platform for visual prototyping of artificial intelligence architectures.
Stars: ✭ 262 (+127.83%)
instant-ngpInstant neural graphics primitives: lightning fast NeRF and more
Stars: ✭ 1,863 (+1520%)
Mpn Cov@ICCV2017: For exploiting second-order statistics, we propose Matrix Power Normalized Covariance pooling (MPN-COV) ConvNets, different from and outperforming those using global average pooling.
Stars: ✭ 63 (-45.22%)
LuisaRenderHigh-Performance Multiple-Backend Renderer Based on LuisaCompute
Stars: ✭ 47 (-59.13%)
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+674.78%)
Tensorflow Object Detection TutorialThe purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
Stars: ✭ 113 (-1.74%)
PyTorchTOPGPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV
Stars: ✭ 58 (-49.57%)
Ddsh Tip2018source code for paper "Deep Discrete Supervised Hashing"
Stars: ✭ 16 (-86.09%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-20%)
Cuda Design PatternsSome CUDA design patterns and a bit of template magic for CUDA
Stars: ✭ 78 (-32.17%)
Qualia2.0Qualia is a deep learning framework deeply integrated with automatic differentiation and dynamic graphing with CUDA acceleration. Qualia was built from scratch.
Stars: ✭ 41 (-64.35%)
IcpcudaSuper fast implementation of ICP in CUDA for compute capable devices 3.5 or higher
Stars: ✭ 416 (+261.74%)
GgnnGGNN: State of the Art Graph-based GPU Nearest Neighbor Search
Stars: ✭ 63 (-45.22%)
desertA fast (?) random sampling drawing library
Stars: ✭ 61 (-46.96%)
LibcudarangeAn interval arithmetic and affine arithmetic library for NVIDIA CUDA
Stars: ✭ 5 (-95.65%)
opencv-cuda-dockerDockerfiles for OpenCV compiled with CUDA, opencv_contrib modules and Python 3 bindings
Stars: ✭ 55 (-52.17%)
CPP-ProgrammingVarious C/C++ examples. DirectX, OpenGL, CUDA, Vulkan, OpenCL.
Stars: ✭ 30 (-73.91%)
Scikit CudaPython interface to GPU-powered libraries
Stars: ✭ 803 (+598.26%)
Gdax Orderbook MlApplication of machine learning to the Coinbase (GDAX) orderbook
Stars: ✭ 60 (-47.83%)
tiny-cuda-nnLightning fast & tiny C++/CUDA neural network framework
Stars: ✭ 908 (+689.57%)
BlocksparseEfficient GPU kernels for block-sparse matrix multiplication and convolution
Stars: ✭ 797 (+593.04%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (-13.91%)
H2o4gpuH2Oai GPU Edition
Stars: ✭ 416 (+261.74%)
octotigerAstrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
Stars: ✭ 30 (-73.91%)
PycudaCUDA integration for Python, plus shiny features
Stars: ✭ 1,112 (+866.96%)
ThrustRTCCUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.
Stars: ✭ 41 (-64.35%)
NumbaNumPy aware dynamic Python compiler using LLVM
Stars: ✭ 7,090 (+6065.22%)
SimNDTUltrasonic NDT Simulator with engine core based on the Elastodynamic Finite Integration Technique (EFIT)
Stars: ✭ 34 (-70.43%)
MinhashcudaWeighted MinHash implementation on CUDA (multi-gpu).
Stars: ✭ 88 (-23.48%)
SixtyfourHow fast can we brute force a 64-bit comparison?
Stars: ✭ 41 (-64.35%)
Tf CorianderOpenCL 1.2 implementation for Tensorflow
Stars: ✭ 775 (+573.91%)
warpcontinuous energy monte carlo neutron transport in general geometries on GPUs
Stars: ✭ 27 (-76.52%)
AccelerateEmbedded language for high-performance array computations
Stars: ✭ 751 (+553.04%)
bazel.cmakebazel.cmake mimics the behavior of bazel to simplify the usability of CMake
Stars: ✭ 38 (-66.96%)
KintinuousReal-time large scale dense visual SLAM system
Stars: ✭ 740 (+543.48%)
JampackExperimental parallel compression algorithm
Stars: ✭ 21 (-81.74%)
Flattened CnnFlattened convolutional neural networks (1D convolution modules for Torch nn)
Stars: ✭ 59 (-48.7%)
GunrockHigh-Performance Graph Primitives on GPUs
Stars: ✭ 718 (+524.35%)
HiopHPC solver for nonlinear optimization problems
Stars: ✭ 75 (-34.78%)
Octree SlamLarge octree map construction and rendering with CUDA and OpenGL
Stars: ✭ 40 (-65.22%)
Ai LabAll-in-one AI container for rapid prototyping
Stars: ✭ 406 (+253.04%)
ClothTOPGPU-accelerated Cloth TOP node for TouchDesigner using the NVIDIA Flex physics solver.
Stars: ✭ 33 (-71.3%)
Warp CtcFast parallel CTC.
Stars: ✭ 3,954 (+3338.26%)
NbodyN body gravity attraction problem solver
Stars: ✭ 40 (-65.22%)
GocvGo package for computer vision using OpenCV 4 and beyond.
Stars: ✭ 4,511 (+3822.61%)
Ios mlList of Machine Learning, AI, NLP solutions for iOS. The most recent version of this article can be found on my blog.
Stars: ✭ 1,409 (+1125.22%)
Neural ApiCAI NEURAL API - Pascal based neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.
Stars: ✭ 94 (-18.26%)
BytecoderRich Domain Model for JVM Bytecode and Framework to interpret and transpile it.
Stars: ✭ 401 (+248.7%)
Style Feature Reshufflecaffe implementation of "Arbitrary Style Transfer with Deep Feature Reshuffle"
Stars: ✭ 38 (-66.96%)