FoldsCUDA.jlData-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)
NCCVShort course on computer vision and image processing using Numba+CUDA+OpenCV
mlspaceMLSpace: Hassle-free machine learning & deep learning development
ppqPPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
OmniSci.jlJulia client for OmniSci GPU-accelerated SQL engine and analytics platform
CuVecUnifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory
kitti deeplabInference script and frozen inference graph with fine tuned weights for semantic segmentation on images from the KITTI dataset.
gblastnG-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST.
VoxelTerrainThis project's main goal is to generate and visualize terrain built using voxels. It was achieved using different approaches and computing technologies just for the sake of performance and implementation comparison.
flexBoxFlexBox is a fexible MATLAB toolbox for finite dimensional convex variational problems in image processing and beyond.
watsorObject detection for video surveillance
GooFitCode repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP
tensorflow-buildsTensorflow binaries and Docker images compiled with GPU support and CPU optimizations.
2D 3D PolarFourierTransformC++, CUDA, and MATLAB codes for the paper "An Exact and Fast Computation of Discrete Fourier Transform for Polar and Spherical Grid"
sentence2vecDeep sentence embedding using Sequence to Sequence learning
EPPMCUDA implementation of the paper "Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow" in CVPR 2014.
MEGADOCKAn ultra-high-performance protein-protein docking for heterogeneous supercomputers
gpumembenchA GPU benchmark suite for assessing on-chip GPU memory bandwidth
CARLsim4CARLsim is an efficient, easy-to-use, GPU-accelerated software framework for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail.
quick-startFloydHub quick start project - train TensorFlow model with MNIST dataset
fdtd3dfdtd3d is an open source 1D, 2D, 3D FDTD electromagnetics solver with MPI, OpenMP and CUDA support for x86, arm, arm64 architectures
cuda-revised-simplexAn implementation of the revised simplex algorithm in CUDA for solving linear optimization problems in the form max{c*x | A*x=b, l<=x<=u}
Jamais-VuAudio Fingerprinting and Recognition in Python using NVidia's CUDA
fresnelPublication quality path tracing in real time.
ArrayLSTMGPU/CPU (CUDA) Implementation of "Recurrent Memory Array Structures", Simple RNN, LSTM, Array LSTM..
SRmeetsPS-CUDACUDA implementation of the paper "Depth Super-Resolution Meets Uncalibrated Photometric Stereo"
pnnpnn is Darknet compatible neural nets inference engine implemented in Rust.
CobraMLAutoML Software designed to give users access to a whole plethora of ML models, some trainable on the GPU.
NNPOpsHigh-performance operations for neural network potentials
rnnt decoder cudaAn efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
rocRANDRAND library for HIP programming language
bsuir-csn-cmsn-helperRepository containing ready-made laboratory works in the specialty of computing machines, systems and networks
sboxgatesProgram for finding low gate count implementations of S-boxes.
la-coreLinear algebra accelerators for RISC-V (published in ICCD 17)
cuhnswCUDA implementation of Hierarchical Navigable Small World Graph algorithm
ShizukuReal time simulation and rendering of free surface fluid
astro-accelerateAstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.