All Projects → anteagle → Gpu_badmm_mt

anteagle / Gpu_badmm_mt

Bregman ADMM for mass transportation on GPU

Labels

Projects that are alternatives of or similar to Gpu badmm mt

Blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Stars: ✭ 797 (+7870%)
Mutual labels:  cuda
Neuralsuperresolution
Real-time video quality improvement for applications such as video-chat using Perceptual Losses
Stars: ✭ 18 (+80%)
Mutual labels:  cuda
Neanderthal
Fast Clojure Matrix Library
Stars: ✭ 927 (+9170%)
Mutual labels:  cuda
Pytorch Loss
label-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
Stars: ✭ 812 (+8020%)
Mutual labels:  cuda
Gmatrix
R package for unleashing the power of NVIDIA GPU's
Stars: ✭ 16 (+60%)
Mutual labels:  cuda
Cudajacobi
CUDA implementation of the Jacobi method
Stars: ✭ 19 (+90%)
Mutual labels:  cuda
Pyopencl
OpenCL integration for Python, plus shiny features
Stars: ✭ 790 (+7800%)
Mutual labels:  cuda
Presentations
Slides and demo code for past presentations
Stars: ✭ 7 (-30%)
Mutual labels:  cuda
Wheels
Performance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+8810%)
Mutual labels:  cuda
Thor
Atmospheric fluid dynamics solver optimized for GPUs.
Stars: ✭ 23 (+130%)
Mutual labels:  cuda
Libcudarange
An interval arithmetic and affine arithmetic library for NVIDIA CUDA
Stars: ✭ 5 (-50%)
Mutual labels:  cuda
Ddsh Tip2018
source code for paper "Deep Discrete Supervised Hashing"
Stars: ✭ 16 (+60%)
Mutual labels:  cuda
Sepconv Slomo
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch
Stars: ✭ 918 (+9080%)
Mutual labels:  cuda
Scikit Cuda
Python interface to GPU-powered libraries
Stars: ✭ 803 (+7930%)
Mutual labels:  cuda
Zluda
CUDA on Intel GPUs
Stars: ✭ 937 (+9270%)
Mutual labels:  cuda
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+7830%)
Mutual labels:  cuda
Libomptarget
Stars: ✭ 18 (+80%)
Mutual labels:  cuda
Stn3d
3D Spatial Transformer Network
Stars: ✭ 8 (-20%)
Mutual labels:  cuda
Cupoisson
CUDA implementation of the 2D fast Poisson solver
Stars: ✭ 7 (-30%)
Mutual labels:  cuda
Lattice net
Fast Point Cloud Segmentation Using Permutohedral Lattices
Stars: ✭ 23 (+130%)
Mutual labels:  cuda

Mass transportation problem: C, a, b. The default a = 1, b = 1;

  1. The file C,a,b for mass transportation problem are generated using matlab storing in a binary file in row-major order. For example, m = 1024; n = m; C = rand(m,n); fid = fopen([int2str(n) 'C.dat'],'w'); fwrite(fid,[m,n],'int'); fwrite(fid,C','float'); fclose(fid);

  2. compile cuda code and tune parameters nvcc -o badmm_mt badmm_mt.cu badmm_kernel.cu -lcublas -arch sm_13 nvprof ./badmm_mt dim_file_name rho max_iteration tolerance_stop_badmm num_steps_print save_output a. dim_file_name: the name of cost matrix is stored in binary with name dim1024C.dat. The name of a,b files are dim1024a.dat and dim*1024b.dat b. rho (float): the parameter of badmm c. max_iteration (int): the maximum number of iterations of badmm d. tol (float): the stopping condition of primal and dual residuals of badmm. If less than tol, badmm will stop. e. num_steps_print (int): print intermediate results every num_steps_print step. If num_steps_print = 0, no print. f. save_output: write the final result of mass transportation into a binary file named X_out.dat The default values will be chosen if setting them to zero or no specification

  3. read X_out to matlab fid = fopen('X_out.dat','r'); X = fread(fid,[n,m],'float'); fclose(fid); X = X';

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].