Nvidia libs testTests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (-12.2%)
DeepnetDeep.Net machine learning framework for F#
Stars: ✭ 99 (+141.46%)
PycudaCUDA integration for Python, plus shiny features
Stars: ✭ 1,112 (+2612.2%)
cuda memtestFork of CUDA GPU memtest 👓
Stars: ✭ 68 (+65.85%)
OnednnoneAPI Deep Neural Network Library (oneDNN)
Stars: ✭ 2,600 (+6241.46%)
BayaderaHigh-performance Bayesian Data Analysis on the GPU in Clojure
Stars: ✭ 342 (+734.15%)
HeteroflowConcurrent CPU-GPU Programming using Task Models
Stars: ✭ 57 (+39.02%)
TengineTengine is a lite, high performance, modular inference engine for embedded device
Stars: ✭ 4,012 (+9685.37%)
GOSHAn ultra-fast, GPU-based large graph embedding algorithm utilizing a novel coarsening algorithm requiring not more than a single GPU.
Stars: ✭ 12 (-70.73%)
Stdgpustdgpu: Efficient STL-like Data Structures on the GPU
Stars: ✭ 531 (+1195.12%)
LuxcoreLuxCore source repository
Stars: ✭ 601 (+1365.85%)
xmrig-buildSimple automated script to build XMRig (dynamic or static) from source on x86-64, ARMv7, and ARMv8 devices.
Stars: ✭ 14 (-65.85%)
GinkgoNumerical linear algebra software package
Stars: ✭ 149 (+263.41%)
Ctranslate2Fast inference engine for OpenNMT models
Stars: ✭ 140 (+241.46%)
Cuda Api WrappersThin C++-flavored wrappers for the CUDA Runtime API
Stars: ✭ 362 (+782.93%)
TutorialsSome basic programming tutorials
Stars: ✭ 353 (+760.98%)
NeanderthalFast Clojure Matrix Library
Stars: ✭ 927 (+2160.98%)
HeCBenchsoftware.intel.com/content/www/us/en/develop/articles/repo-evaluating-performance-productivity-oneapi.html
Stars: ✭ 85 (+107.32%)
Unisimd AssemblerSIMD macro assembler unified for ARM, MIPS, PPC and x86
Stars: ✭ 63 (+53.66%)
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+2073.17%)
Capstone.NET.NET Core and .NET Framework binding for the Capstone Disassembly Framework
Stars: ✭ 108 (+163.41%)
HipsyclImplementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (+819.51%)
Asm DudeVisual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window
Stars: ✭ 3,898 (+9407.32%)
rbcudaCUDA bindings for Ruby
Stars: ✭ 57 (+39.02%)
ArraymancerA fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+1834.15%)
NsimdAgenium Scale vectorization library for CPUs and GPUs
Stars: ✭ 138 (+236.59%)
Autodock GpuAutoDock for GPUs and other accelerators
Stars: ✭ 65 (+58.54%)
Tensorflow Optimized WheelsTensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
Stars: ✭ 118 (+187.8%)
ClojurecudaClojure library for CUDA development
Stars: ✭ 158 (+285.37%)
MatXAn efficient C++17 GPU numerical computing library with Python-like syntax
Stars: ✭ 418 (+919.51%)
AccelerateEmbedded language for high-performance array computations
Stars: ✭ 751 (+1731.71%)
RappelA linux-based assembly REPL for x86, amd64, armv7, and armv8
Stars: ✭ 818 (+1895.12%)
UammdA CUDA project for Molecular Dynamics, Brownian Dynamics, Hydrodynamics... intended to simulate a very generic system constructing a simulation with modules.
Stars: ✭ 11 (-73.17%)
CudaExperiments with CUDA and Rust
Stars: ✭ 31 (-24.39%)
Theano Roi AlignAn implementation of the RoiAlign operation for Theano
Stars: ✭ 11 (-73.17%)
DirectxmathDirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Stars: ✭ 859 (+1995.12%)
Smallpt Parallel Bvh GpuA GPU implementation of smallpt (http://www.kevinbeason.com/smallpt/) with Bounding Volume Hierarchy (BVH) tree.
Stars: ✭ 36 (-12.2%)
Cuda word splitThis project is an old code for Chinese words split. It is written by CUDA at 2010, so it could not run well directly under you platform without an GPU card.
Stars: ✭ 31 (-24.39%)
Gpu badmm mtBregman ADMM for mass transportation on GPU
Stars: ✭ 10 (-75.61%)
Stn3d3D Spatial Transformer Network
Stars: ✭ 8 (-80.49%)
LikwidPerformance monitoring and benchmarking suite
Stars: ✭ 957 (+2234.15%)
PresentationsSlides and demo code for past presentations
Stars: ✭ 7 (-82.93%)
Fractional differencing gpuRapid large-scale fractional differencing with RAPIDS to minimize memory loss while making a time series stationary. 6x-400x speed up over CPU implementation.
Stars: ✭ 38 (-7.32%)
1833718.337 - Parallel Computing and Scientific Machine Learning
Stars: ✭ 834 (+1934.15%)
BindsnetSimulation of spiking neural networks (SNNs) using PyTorch.
Stars: ✭ 837 (+1941.46%)
CupoissonCUDA implementation of the 2D fast Poisson solver
Stars: ✭ 7 (-82.93%)
Cuda CnnImplementation of a simple CNN using CUDA
Stars: ✭ 29 (-29.27%)
ZludaCUDA on Intel GPUs
Stars: ✭ 937 (+2185.37%)
KeypatchMulti-architecture assembler for IDA Pro. Powered by Keystone Engine.
Stars: ✭ 939 (+2190.24%)
Cure Stars: ✭ 36 (-12.2%)
Reverse EngineeringThis repository contains some of the executables that I've cracked.
Stars: ✭ 29 (-29.27%)
Os2x86_64 OS kernel with completely async userspace and single address space [WIP; but basic kernel functionality implemented]
Stars: ✭ 25 (-39.02%)
Javassembly💾 Calling Assembly from Java: simple example using the JNI and NASM.
Stars: ✭ 28 (-31.71%)
BeelzebubThe Lord of Flies - A hobby operating system
Stars: ✭ 24 (-41.46%)
ThorAtmospheric fluid dynamics solver optimized for GPUs.
Stars: ✭ 23 (-43.9%)
NbodyN body gravity attraction problem solver
Stars: ✭ 40 (-2.44%)