anteagle / Gpu_badmm_mt
Bregman ADMM for mass transportation on GPU
Projects that are alternatives of or similar to Gpu badmm mt
BlocksparseEfficient GPU kernels for block-sparse matrix multiplication and convolution
Stars: ✭ 797 (+7870%)
Mutual labels: cuda
NeuralsuperresolutionReal-time video quality improvement for applications such as video-chat using Perceptual Losses
Stars: ✭ 18 (+80%)
Mutual labels: cuda
NeanderthalFast Clojure Matrix Library
Stars: ✭ 927 (+9170%)
Mutual labels: cuda
Pytorch Losslabel-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
Stars: ✭ 812 (+8020%)
Mutual labels: cuda
GmatrixR package for unleashing the power of NVIDIA GPU's
Stars: ✭ 16 (+60%)
Mutual labels: cuda
CudajacobiCUDA implementation of the Jacobi method
Stars: ✭ 19 (+90%)
Mutual labels: cuda
PyopenclOpenCL integration for Python, plus shiny features
Stars: ✭ 790 (+7800%)
Mutual labels: cuda
PresentationsSlides and demo code for past presentations
Stars: ✭ 7 (-30%)
Mutual labels: cuda
WheelsPerformance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+8810%)
Mutual labels: cuda
ThorAtmospheric fluid dynamics solver optimized for GPUs.
Stars: ✭ 23 (+130%)
Mutual labels: cuda
LibcudarangeAn interval arithmetic and affine arithmetic library for NVIDIA CUDA
Stars: ✭ 5 (-50%)
Mutual labels: cuda
Ddsh Tip2018source code for paper "Deep Discrete Supervised Hashing"
Stars: ✭ 16 (+60%)
Mutual labels: cuda
Sepconv Slomoan implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch
Stars: ✭ 918 (+9080%)
Mutual labels: cuda
Scikit CudaPython interface to GPU-powered libraries
Stars: ✭ 803 (+7930%)
Mutual labels: cuda
ZludaCUDA on Intel GPUs
Stars: ✭ 937 (+9270%)
Mutual labels: cuda
ArraymancerA fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+7830%)
Mutual labels: cuda
Stn3d3D Spatial Transformer Network
Stars: ✭ 8 (-20%)
Mutual labels: cuda
CupoissonCUDA implementation of the 2D fast Poisson solver
Stars: ✭ 7 (-30%)
Mutual labels: cuda
Lattice netFast Point Cloud Segmentation Using Permutohedral Lattices
Stars: ✭ 23 (+130%)
Mutual labels: cuda
Mass transportation problem: C, a, b. The default a = 1, b = 1;
-
The file C,a,b for mass transportation problem are generated using matlab storing in a binary file in row-major order. For example,
m = 1024;
n = m;
C = rand(m,n);
fid = fopen([int2str(n) 'C.dat'],'w');
fwrite(fid,[m,n],'int');
fwrite(fid,C','float');
fclose(fid);
-
compile cuda code and tune parameters
nvcc -o badmm_mt badmm_mt.cu badmm_kernel.cu -lcublas -arch sm_13
nvprof ./badmm_mt dim_file_name rho max_iteration tolerance_stop_badmm num_steps_print save_output
a. dim_file_name: the name of cost matrix is stored in binary with name dim1024C.dat. The name of a,b files are dim1024a.dat and dim*1024b.dat
b. rho (float): the parameter of badmm
c. max_iteration (int): the maximum number of iterations of badmm
d. tol (float): the stopping condition of primal and dual residuals of badmm. If less than tol, badmm will stop.
e. num_steps_print (int): print intermediate results every num_steps_print step. If num_steps_print = 0, no print.
f. save_output: write the final result of mass transportation into a binary file named X_out.dat
The default values will be chosen if setting them to zero or no specification
-
read X_out to matlab
fid = fopen('X_out.dat','r');
X = fread(fid,[n,m],'float');
fclose(fid);
X = X';
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at
[email protected].