All Projects → 3cHeLoN → Cupoisson

3cHeLoN / Cupoisson

CUDA implementation of the 2D fast Poisson solver

Labels

Projects that are alternatives of or similar to Cupoisson

Numba
NumPy aware dynamic Python compiler using LLVM
Stars: ✭ 7,090 (+101185.71%)
Mutual labels:  cuda
Ddsh Tip2018
source code for paper "Deep Discrete Supervised Hashing"
Stars: ✭ 16 (+128.57%)
Mutual labels:  cuda
Sepconv Slomo
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch
Stars: ✭ 918 (+13014.29%)
Mutual labels:  cuda
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+11228.57%)
Mutual labels:  cuda
Libcudarange
An interval arithmetic and affine arithmetic library for NVIDIA CUDA
Stars: ✭ 5 (-28.57%)
Mutual labels:  cuda
Wheels
Performance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Stars: ✭ 891 (+12628.57%)
Mutual labels:  cuda
Ethereum nvidia miner
💰 USB flash drive ISO image for Ethereum, Zcash and Monero mining with NVIDIA graphics cards and Ubuntu GNU/Linux (headless)
Stars: ✭ 772 (+10928.57%)
Mutual labels:  cuda
Neanderthal
Fast Clojure Matrix Library
Stars: ✭ 927 (+13142.86%)
Mutual labels:  cuda
Cudadbclustering
Clustering via Graphics Processor, using NVIDIA CUDA sdk to preform database clustering on the massively parallel graphics card processor
Stars: ✭ 6 (-14.29%)
Mutual labels:  cuda
Cudajacobi
CUDA implementation of the Jacobi method
Stars: ✭ 19 (+171.43%)
Mutual labels:  cuda
Blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Stars: ✭ 797 (+11285.71%)
Mutual labels:  cuda
Pytorch Loss
label-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
Stars: ✭ 812 (+11500%)
Mutual labels:  cuda
Neuralsuperresolution
Real-time video quality improvement for applications such as video-chat using Perceptual Losses
Stars: ✭ 18 (+157.14%)
Mutual labels:  cuda
Pyopencl
OpenCL integration for Python, plus shiny features
Stars: ✭ 790 (+11185.71%)
Mutual labels:  cuda
Lattice net
Fast Point Cloud Segmentation Using Permutohedral Lattices
Stars: ✭ 23 (+228.57%)
Mutual labels:  cuda
Marian
Fast Neural Machine Translation in C++
Stars: ✭ 777 (+11000%)
Mutual labels:  cuda
Gmatrix
R package for unleashing the power of NVIDIA GPU's
Stars: ✭ 16 (+128.57%)
Mutual labels:  cuda
Zluda
CUDA on Intel GPUs
Stars: ✭ 937 (+13285.71%)
Mutual labels:  cuda
Thor
Atmospheric fluid dynamics solver optimized for GPUs.
Stars: ✭ 23 (+228.57%)
Mutual labels:  cuda
Libomptarget
Stars: ✭ 18 (+157.14%)
Mutual labels:  cuda

cuPoisson

CuPoisson is a GPU implementation of the 2D fast Poisson solver using CUDA. The method solves the discrete Poisson equation on a rectangular grid, assuming zero Dirichlet boundary conditions.

This code is the result of a master's thesis written by Folkert Bleichrodt at Utrecht University under the supervision of Henk Dijkstra and Rob Bisseling. A scientific paper has been published, discussing the Poisson solver as part of a method for solving a 2D PDE ocean model. The paper can be downloaded at: http://dx.doi.org/10.1016/j.ocemod.2011.10.001

How to acknowledge?

If you use this code for your research, we would appreciate it if you would refer to the following paper:

F. Bleichrodt, R. H. Bisseling, and H. A. Dijkstra. "Accelerating a barotropic ocean model using a GPU." Ocean Modelling, Volume 41 http://www.sciencedirect.com/science/journal/14635003/41/supp/C, 2012, Pages 16–21

Feedback

I would be grateful for any feedback/comments on the code, or questions regarding the documentation. Please sent me an email at F.Bleichrodt [a| cwi.nl

Move to github/news

As most of you probably know by now, Google code is being discontinued. I have moved the code to Github, which also should make it easier to allow pull-requests. I'm planning to update the code which now includes a Matlab interface.

1. Running the program

A simple driver file main.c has been provided to show how you can use cupoisson in your own C code. Currently the code is not provided as a library, since the codebase is quiet small.

After compilation, go to the build folder and run

$ ./cupoisson

to execute the sample file.

The solution is stored as a binary (or text) file. To make a contour plot, a matlab m-file plotSolutionSingle.m has been provided (or plotSolutionDouble.m if you are using double precision).

1.1 Compiling

Go to the build folder:

$ cd build

run compiler:

$ make

This will put the program cupoisson inside the build folder. To run the program you should execute

$ ./cupoisson

For cleaning up object files, run

$ make clean

1.2 Using double precision

Default, the Poisson solver is compiled in single precision format. On the GPU, the highest performance is gained when using single precision computations. If this is not accurate enough for your application, the program has to be compiled to support double precision. Please follow these steps exactly:

Edit precision.h and uncomment the line: //#define DOUBLE_PRECISION this will tell the compiler to use double in place of float. 2. If the code has been compiled using single precision previously, clean up the compilation: $ make clean 3. Edit the Makefile and uncomment the line: NVCCPARMS += -arch sm_13 this will tell the nvcc compiler to utilize all features of devices of compute capability 1.3 (including double precision) 4. Compile: $ make Note that a performance penalty of approximately a factor 2 is payed by using double precision. The difference in performance might be different for your hardware. If the accuracy is high enough, always consider using single precision first!

To compile again for single precision accuracy, revert all steps 1-3.

1.3 Trouble shooting

If compilation fails, make sure that:

CUDA is installed The CUDA library path is in your $LD_LIBRARY_PATH Check the include directory in CFLAGS of the makefile

2. Source code contents

I will give a short description of the contents of the source code.

2.1 main.c

A driver/example file for using the Poisson solver from your C/C++ code.

2.2 poisson.cu

The poisson solver using CUFFT. This file also has the implemation of the realFFT. Currently only square domains are supported. For performance benefits, grids of size 2^n+1 are best. In this case the FFT transform of length 2(2^n+1-1) is computed (again a power of 2) for which the FFT has the highest performance.

Currently, only the grids interior is computed since zero Dirichlet boundary conditions are assumed. When writing out the grid as a text-file, zeros are padded to the boundaries. If another type of boundary condition is needed, this needs to be adjusted in the code.

2.3 utils.cu

Some utilities for error checking, as well as writing out the solution to a file.

2.4 precision.h

Here you can specify if you want to use double precision instead of single precision GPU code. Please refer to section 1.2 for all the details.

3. Additional remarks

The current code is optimized to minimize data transfer from CPU to GPU to shared mem (cache). This is why there is some amount of code duplication. At certain points in the program, data is available on the shared memory of the multiprocessors. This data has to be copied to the GPU global memory. If the result must be transposed, it would not be optimal to first copy the data to global memory and then use another kernel to do the transpose operation (which will probably use shared memory again). As a result, our CUDA kernels are quiet big and have several tasks instead of only one.

The main details of the algorithm are described in a research paper: http://dx.doi.org/10.1016/j.ocemod.2011.10.001

The fast Poisson solver exploits an eigen decomposition of the discrete Poisson operator. Discrete sine transforms, computed using a (real)FFT, form the main body of the code. We suggest the user to read sections 3.3 and 4.1 of the paper for implementation details.

Note that the code is oblivious to actual domain size. The spacing h between gridpoints is important for the solution. Therefore, multiply the solution by h*h (h^2) since the discrete Poisson operator follows from the Poisson equation using a central finite difference scheme.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].