All Projects → geggo → gpyfft

geggo / gpyfft

Licence: LGPL-3.0 license
python wrapper for the OpenCL FFT library clFFT

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to gpyfft

pystella
A code generator for grid-based PDE solving on CPUs and GPUs
Stars: ✭ 18 (-65.38%)
Mutual labels:  opencl, pyopencl
boxtree
Quad/octree building for FMMs in Python and OpenCL
Stars: ✭ 52 (+0%)
Mutual labels:  opencl, pyopencl
fluctus
An interactive OpenCL wavefront path tracer
Stars: ✭ 55 (+5.77%)
Mutual labels:  opencl
unity-music-visualizer
Basic music visualization project for Unity.
Stars: ✭ 39 (-25%)
Mutual labels:  fft
unicorn-fft
Audio visualization on the Unicorn Hat using FFTW
Stars: ✭ 36 (-30.77%)
Mutual labels:  fft
psycopgr
A Python wrapper of pgRouting for routing from nodes to nodes on real map.
Stars: ✭ 24 (-53.85%)
Mutual labels:  python-wrapper
SwiftOpenCL
A swift wrapper around OpenCL. Modelled off the cpp wrapper
Stars: ✭ 17 (-67.31%)
Mutual labels:  opencl
BurstFFT
FFT implementation in C# optimized for Unity's Burst compiler
Stars: ✭ 90 (+73.08%)
Mutual labels:  fft
pyduktape
Embed the Duktape JS interpreter in Python
Stars: ✭ 77 (+48.08%)
Mutual labels:  python-wrapper
dsp
DSP and filtering library
Stars: ✭ 36 (-30.77%)
Mutual labels:  fft
Amplifier.NET
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.
Stars: ✭ 142 (+173.08%)
Mutual labels:  opencl
vexed-generation
Polymorphic helper functions & geometry ops for Houdini VEX / OpenCL
Stars: ✭ 32 (-38.46%)
Mutual labels:  opencl
hpc
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (-25%)
Mutual labels:  opencl
gardenia
GARDENIA: Graph Analytics Repository for Designing Efficient Next-generation Accelerators
Stars: ✭ 22 (-57.69%)
Mutual labels:  opencl
drop
A LÖVE visualizer and music player
Stars: ✭ 17 (-67.31%)
Mutual labels:  fft
bandicoot-code
Bandicoot: GPU accelerator add-on for the Armadillo C++ linear algebra library
Stars: ✭ 21 (-59.62%)
Mutual labels:  opencl
CLBLAS.jl
CLBLAS integration for Julia
Stars: ✭ 20 (-61.54%)
Mutual labels:  opencl
ck-clsmith
Collective Knowledge extension to crowdsource bug detection in OpenCL compilers using CLSmith tool from Imperial College London
Stars: ✭ 26 (-50%)
Mutual labels:  opencl
fmcw-RADAR
[mmWave based fmcw radar design files] based on AWR1843 chip operating at 76-GHz to 81-GHz.
Stars: ✭ 41 (-21.15%)
Mutual labels:  fft
tensor stream-opencl
An OpenCL backend for TensorStream
Stars: ✭ 26 (-50%)
Mutual labels:  opencl

gpyfft

A Python wrapper for the OpenCL FFT library clFFT.

Introduction

clFFT

The open source library clFFT implements FFT for running on a GPU via OpenCL. Some highlights are:

  • batched 1D, 2D, and 3D transforms
  • supports many transform sizes (any combinatation of powers of 2,3,5,7,11, and 13)
  • flexible memory layout
  • single and double precisions
  • complex and real-to-complex transforms
  • supports injecting custom code for data pre- and post-processing

gpyfft

This python wrapper is designed to tightly integrate with PyOpenCL. It consists of a low-level Cython based wrapper with an interface similar to the underlying C library. On top of that it offers a high-level interface designed to work on data contained in instances of pyopencl.array.Array, a numpy work-alike array class for GPU computations. The high-level interface takes some inspiration from pyFFTW. For details of the high-level interface see fft.py.

News

  • 2017/11/05 for 2D and 3D transforms with default (empty) settings for the transform axes, now a more clever ordering of the transform axes is chosen, depending on the memory layout: last axis is transformed first for a C contiguous input array. I have seen huge performance improvements, 3x to 4x compared to the previous approach (always first axis first). Please report back benchmark results ('python -m gpyfft.benchmark') if this holds true for your GPU.

Status

The low lever interface is complete (more or less), the high-level interface is not yet settled and likely to change in future. Features to come (not yet implemented in the high-level interface):

work done

  • low level wrapper (mostly) completed
  • high level wrapper
  • complex-to-complex transform, in- and out-of-place
  • real-to-complex transform (out-of-place)
  • complex-to-real transform (out-of-place)
  • single precision
  • double precision
  • interleaved data
  • support injecting custom OpenCL code (pre and post callbacks)
  • accept pyopencl arrays with non-zero offsets (Syam Gadde)
  • heuristics for optimal performance for choosing order axes transform if none given (Release 0.7.1)

Basic usage

Here we describe a simple example of performing a batch of 2D complex-to-complex FFT transforms on the GPU, using the high-level interface of gpyfft. The full source code of this example ist contained in simple_example.py, which is the essence of benchmark.py. Note, for testing it is recommended to start simple_example.py from the command line, so you have the possibility to interactively choose an OpenCL context (otherwise, e.g. when using an IPython, you are not asked end might end up with a CPU device, which is prone to fail).

imports:

import numpy as np
import pyopencl as cl
import pyopencl.array as cla
from gpyfft.fft import FFT

initialize GPU:

context = cl.create_some_context()
queue = cl.CommandQueue(context)

initialize memory (on host and GPU). In this example we want to perform in parallel four 2D FFTs for 1024x1024 single precision data.

data_host = np.zeros((4, 1024, 1024), dtype = np.complex64)
#data_host[:] = some_useful_data
data_gpu = cla.to_device(queue, data_host)

create FFT transform plan for batched inline 2D transform along second two axes.

transform = FFT(context, queue, data_gpu, axes = (2, 1))

If you want an out-of-place transform, provide the output array as additional argument after the input data.

Start the work and wait until it is finished (Note that enqueu() returns a tuple of events)

event, = transform.enqueue()
event.wait()

Read back the data from the GPU to the host

result_host = data_gpu.get()

Benchmark

A simple benchmark is contained as a submodule, you can run it on the command line by python -m gpyfft.benchmark, or from Python

import gpyfft.benchmark
gpyfft.benchmark.run()

Note, you might want to set the PYOPENCL_CTX environment variable to select your OpenCL platform and device.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].