All Projects → canonizer → Halloc

canonizer / Halloc

Licence: other
A fast and highly scalable GPU dynamic memory allocator

Labels

Projects that are alternatives of or similar to Halloc

Deepjointfilter
The source code of ECCV16 'Deep Joint Image Filtering'.
Stars: ✭ 68 (-23.6%)
Mutual labels:  cuda
Nnabla Ext Cuda
A CUDA Extension of Neural Network Libraries
Stars: ✭ 79 (-11.24%)
Mutual labels:  cuda
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (+1326.97%)
Mutual labels:  cuda
Parenchyma
An extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (-20.22%)
Mutual labels:  cuda
Cuda Design Patterns
Some CUDA design patterns and a bit of template magic for CUDA
Stars: ✭ 78 (-12.36%)
Mutual labels:  cuda
Pytorch Emdloss
PyTorch 1.0 implementation of the approximate Earth Mover's Distance
Stars: ✭ 82 (-7.87%)
Mutual labels:  cuda
Alenka
GPU database engine
Stars: ✭ 1,150 (+1192.13%)
Mutual labels:  cuda
Deep Learning With Cats
Deep learning with cats (^._.^)
Stars: ✭ 1,290 (+1349.44%)
Mutual labels:  cuda
2016 super resolution
ICCV2015 Image Super-Resolution Using Deep Convolutional Networks
Stars: ✭ 78 (-12.36%)
Mutual labels:  cuda
Python Opencv Cuda
custom opencv_contrib module which exposes opencv cuda optical flow methods with python bindings
Stars: ✭ 86 (-3.37%)
Mutual labels:  cuda
Titan
A high-performance CUDA-based physics simulation sandbox for soft robotics and reinforcement learning.
Stars: ✭ 73 (-17.98%)
Mutual labels:  cuda
Hiop
HPC solver for nonlinear optimization problems
Stars: ✭ 75 (-15.73%)
Mutual labels:  cuda
Mpr
Reference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
Stars: ✭ 84 (-5.62%)
Mutual labels:  cuda
Project Currennt Public
CURRENNNT codes and scripts
Stars: ✭ 69 (-22.47%)
Mutual labels:  cuda
Minhashcuda
Weighted MinHash implementation on CUDA (multi-gpu).
Stars: ✭ 88 (-1.12%)
Mutual labels:  cuda
Torch sampling
Efficient reservoir sampling implementation for PyTorch
Stars: ✭ 68 (-23.6%)
Mutual labels:  cuda
Modulated Deform Conv
deformable convolution 2D 3D DeformableConvolution DeformConv Modulated Pytorch CUDA
Stars: ✭ 81 (-8.99%)
Mutual labels:  cuda
Weighted softmax loss
Weighted Softmax Loss Layer for Caffe
Stars: ✭ 89 (+0%)
Mutual labels:  cuda
Thundersvm
ThunderSVM: A Fast SVM Library on GPUs and CPUs
Stars: ✭ 1,282 (+1340.45%)
Mutual labels:  cuda
Knn cuda
pytorch knn [cuda version]
Stars: ✭ 86 (-3.37%)
Mutual labels:  cuda

Halloc GPU memory allocator, version 0.11

INTRO

Halloc is a high-throughput malloc/free-style dynamic memory allocator for NVidia Kepler GPUs. It is based on using bit arrays to represent free blocks and using a hash function to quickly search for free blocks. This idea, combined with clever slab management and performance tuning, enables a really fast allocator. Halloc achieves more than 1.5 bln. mallocs/s (more than 1 bln. malloc/free pairs/s) on K20X and 16-byte allocations, with tens of thousands of GPU threads and more than 100MiB allocated. This is much higher than other state-of-the-art GPU allocators. In addition, halloc's performance is also more stable. This makes halloc suitable for use in GPGPU applications requiring fast dynamic memory management. Halloc is mainly designed for small allocation sizes, and delegates allocations larger than 3KiB to CUDA allocator.

REQUIREMENTS

Software: CUDA 5.0 or higher (tested with 6.5) Hardware: Compute Capability 2.0 or higher (tested on CC 3.5 devices K20X and K40).

Note: libraries and tests are currently not compiled for compute_50/sm_50, i.e. Maxwell.

COMPILING

To compile halloc library, type (in project's top directory):

make

To run correctness tests (CAUTION: takes a lot of time!):

make test

To build correctness tests without running them:

make build-corr

To build performance tests without running them:

make build-perf

Performance tests are then located in ./tst/perf/bin directory, and can be invoked individually, e.g.

./tst/perf/bin/throughput
./tst/perf/bin/phase-throughput -f0.95 -F0.05 -e0.91 -g5 -t128

To install, edit PREFIX variable in the makefile to your desired install directory (default ~/usr) and type:

make install

To uninstall:

make uninstall

USING HALLOC

See samples/ directory for samples using Halloc.

Compiling Your Program

The GPU application then needs to be compiled with halloc static library using separate device compilation and linking. Assuming that the variable $PREFIX contains the installation prefix, and myprog.cu is the file being compiled, this can be done as follows:

nvcc -arch=sm_35 -O3 -I $(PREFIX)/include -dc myprog.cu -o myprog.o nvcc -arch=sm_35 -O3 -L $(PREFIX)/lib -lhalloc -o myprog myprog.o

Halloc API

The functions defined by halloc are in the halloc.h file, which needs to be included into your code to use halloc:

#include <halloc.h>

Before using halloc, in device functions it has to be initialized with ha_init() function:

void ha_init(halloc_opts_t opts = halloc_opts_t());

It can be given a full halloc_opts_t structure to control fine halloc parameters, such as slab size or fraction of used chunks at which the slab is considered "busy". It can also be called just with specifying amount of memory to allocate, or completely without any parameter list to preserve defaults:

ha_init(512 * 1024 * 1024); // pass memory to allocate ha_init(); // use default amount of memory

Halloc defines two functions, hamalloc to allocate and hafree to free memory (malloc and free are used by CUDA allocator, therefore halloc has to use other names). These functions can only be called from device code.

void *hamalloc(size_t nbytes); void hafree(void *p);

Otherwise, these functions have pretty much the same behavior as standard C malloc/free, e.g.:

// allocate an array int *p = (int *)hamalloc(8 * sizeof(int)); p[0] = 0; p[1] = threadIdx.x; p[2] = 2; // ... // free the array hafree(p);

// allocate a list typedef struct list_ { int element; struct list_ *next; } list; // ... list *l = (list *)hamalloc(sizeof(list)); l->element = 1; l->next = (list *)hamalloc(sizeof(list)); l->next->element = 2; l->next->next = NULL;

The functions can be used in pretty much the same way as in C programs. hamalloc accepts the number of bytes to allocate, and returns the pointer to allocated memory, or NULL if memory cannot be allocated. Similarly, hafree accepts either a pointer returned by hamalloc or NULL, and frees the memory previously allocated. Naturally, hamalloc and hafree are thread-safe, and can be called simultaneously by threads of the same or different kernels. hamalloc allocations persist across kernel invocations, and can be used in other kernel calls. Pointers allocated by hamalloc can only be freed by hafree; they cannot be deallocated, e.g., by host/device cudaFree/free.

ha_shutdown() is intended to free resources used by halloc, but is currently a no-op.

LIMITATIONS

There is currently no way to change parameters or allocate more memory after halloc has been initialized.

BUGS

Though correctness tests pass successfully, this provies nothing, of course. Some bugs are most likely there ;)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].