Dink点云深度学习框架 | Point cloud Deep learning Framework
Stars: ✭ 56 (-66.86%)
DaceDaCe - Data Centric Parallel Programming
Stars: ✭ 106 (-37.28%)
Carlsim3CARLsim is an efficient, easy-to-use, GPU-accelerated software framework for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail.
Stars: ✭ 52 (-69.23%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (-17.16%)
Cs344Introduction to Parallel Programming class code
Stars: ✭ 1,051 (+521.89%)
Cuda WinogradFast CUDA Kernels for ResNet Inference.
Stars: ✭ 104 (-38.46%)
RmmRAPIDS Memory Manager
Stars: ✭ 154 (-8.88%)
Lyra Stars: ✭ 43 (-74.56%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (-41.42%)
Marian DevFast Neural Machine Translation in C++ - development repository
Stars: ✭ 136 (-19.53%)
SixtyfourHow fast can we brute force a 64-bit comparison?
Stars: ✭ 41 (-75.74%)
Extending JaxExtending JAX with custom C++ and CUDA code
Stars: ✭ 98 (-42.01%)
NbodyN body gravity attraction problem solver
Stars: ✭ 40 (-76.33%)
Soul EnginePhysically based renderer and simulation engine for real-time applications.
Stars: ✭ 37 (-78.11%)
PynvvlA Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-43.79%)
Nvidia libs testTests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (-78.7%)
Partial Order PruningPartial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Stars: ✭ 135 (-20.12%)
Object Detection And Location Realsensed435Use the Intel D435 real-sensing camera to realize target detection based on the Yolov3 framework under the Opencv DNN framework, and realize the 3D positioning of the Objection according to the depth information. Real-time display of the coordinates in the camera coordinate system.ADD--Using Yolov5 By TensorRT model,AGX-Xavier,RealTime Object Detection
Stars: ✭ 36 (-78.7%)
Fbtt EmbeddingThis is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-45.56%)
DsmnetDomain-invariant Stereo Matching Networks
Stars: ✭ 153 (-9.47%)
CudaExperiments with CUDA and Rust
Stars: ✭ 31 (-81.66%)
Des CudaDES cracking using brute force algorithm and CUDA
Stars: ✭ 21 (-87.57%)
MatconvnetMatConvNet: CNNs for MATLAB
Stars: ✭ 1,299 (+668.64%)
QudaQUDA is a library for performing calculations in lattice QCD on GPUs.
Stars: ✭ 166 (-1.78%)
UammdA CUDA project for Molecular Dynamics, Brownian Dynamics, Hydrodynamics... intended to simulate a very generic system constructing a simulation with modules.
Stars: ✭ 11 (-93.49%)
Deeppipe2Deep Learning library using GPU(CUDA/cuBLAS)
Stars: ✭ 90 (-46.75%)
Gpu badmm mtBregman ADMM for mass transportation on GPU
Stars: ✭ 10 (-94.08%)
LibcudacxxThe C++ Standard Library for your entire system.
Stars: ✭ 1,861 (+1001.18%)
PresentationsSlides and demo code for past presentations
Stars: ✭ 7 (-95.86%)
ZludaCUDA on Intel GPUs
Stars: ✭ 937 (+454.44%)
JetsonHelmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
Stars: ✭ 151 (-10.65%)
ThorAtmospheric fluid dynamics solver optimized for GPUs.
Stars: ✭ 23 (-86.39%)
ThundersvmThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+658.58%)
Sepconv Slomoan implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch
Stars: ✭ 918 (+443.2%)
AgencyExecution primitives for C++
Stars: ✭ 127 (-24.85%)
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+427.22%)
PrimitivA Neural Network Toolkit.
Stars: ✭ 164 (-2.96%)
Ddsh Tip2018source code for paper "Deep Discrete Supervised Hashing"
Stars: ✭ 16 (-90.53%)
Knn cudapytorch knn [cuda version]
Stars: ✭ 86 (-49.11%)
LibcudarangeAn interval arithmetic and affine arithmetic library for NVIDIA CUDA
Stars: ✭ 5 (-97.04%)
Pytorch EmdlossPyTorch 1.0 implementation of the approximate Earth Mover's Distance
Stars: ✭ 82 (-51.48%)
DragonDragon: A Computation Graph Virtual Machine Based Deep Learning Framework.
Stars: ✭ 168 (-0.59%)
FloorA C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.
Stars: ✭ 166 (-1.78%)
JcudaJCuda - Java bindings for CUDA
Stars: ✭ 165 (-2.37%)
ForwardA library for high performance deep learning inference on NVIDIA GPUs.
Stars: ✭ 136 (-19.53%)
HashcatWorld's fastest and most advanced password recovery utility
Stars: ✭ 11,014 (+6417.16%)
3d Ken Burnsan implementation of 3D Ken Burns Effect from a Single Image using PyTorch
Stars: ✭ 1,073 (+534.91%)