Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → canonizer → Halloc

canonizer / Halloc

Licence: other

A fast and highly scalable GPU dynamic memory allocator

Labels

cuda

Projects that are alternatives of or similar to Halloc

Deepjointfilter

The source code of ECCV16 'Deep Joint Image Filtering'.

Stars: ✭ 68 (-23.6%)

Mutual labels: cuda

Nnabla Ext Cuda

A CUDA Extension of Neural Network Libraries

Stars: ✭ 79 (-11.24%)

Mutual labels: cuda

Deep Learning Boot Camp

A community run, 5-day PyTorch Deep Learning Bootcamp

Stars: ✭ 1,270 (+1326.97%)

Mutual labels: cuda

Parenchyma

An extensible HPC framework for CUDA, OpenCL and native CPU.

Stars: ✭ 71 (-20.22%)

Mutual labels: cuda

Cuda Design Patterns

Some CUDA design patterns and a bit of template magic for CUDA

Stars: ✭ 78 (-12.36%)

Mutual labels: cuda

Pytorch Emdloss

PyTorch 1.0 implementation of the approximate Earth Mover's Distance

Stars: ✭ 82 (-7.87%)

Mutual labels: cuda

Alenka

GPU database engine

Stars: ✭ 1,150 (+1192.13%)

Mutual labels: cuda

Deep Learning With Cats

Deep learning with cats (^._.^)

Stars: ✭ 1,290 (+1349.44%)

Mutual labels: cuda

2016 super resolution

ICCV2015 Image Super-Resolution Using Deep Convolutional Networks

Stars: ✭ 78 (-12.36%)

Mutual labels: cuda

Python Opencv Cuda

custom opencv_contrib module which exposes opencv cuda optical flow methods with python bindings

Stars: ✭ 86 (-3.37%)

Mutual labels: cuda

Titan

A high-performance CUDA-based physics simulation sandbox for soft robotics and reinforcement learning.

Stars: ✭ 73 (-17.98%)

Mutual labels: cuda

Hiop

HPC solver for nonlinear optimization problems

Stars: ✭ 75 (-15.73%)

Mutual labels: cuda

Mpr

Reference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)

Stars: ✭ 84 (-5.62%)

Mutual labels: cuda

Project Currennt Public

CURRENNNT codes and scripts

Stars: ✭ 69 (-22.47%)

Mutual labels: cuda

Minhashcuda

Weighted MinHash implementation on CUDA (multi-gpu).

Stars: ✭ 88 (-1.12%)

Mutual labels: cuda

Torch sampling

Efficient reservoir sampling implementation for PyTorch

Stars: ✭ 68 (-23.6%)

Mutual labels: cuda

Modulated Deform Conv

deformable convolution 2D 3D DeformableConvolution DeformConv Modulated Pytorch CUDA

Stars: ✭ 81 (-8.99%)

Mutual labels: cuda

Weighted softmax loss

Weighted Softmax Loss Layer for Caffe

Stars: ✭ 89 (+0%)

Mutual labels: cuda

Thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs

Stars: ✭ 1,282 (+1340.45%)

Mutual labels: cuda

Knn cuda

pytorch knn [cuda version]

Stars: ✭ 86 (-3.37%)

Mutual labels: cuda

View All Similar Projects ➔

Halloc GPU memory allocator, version 0.11

INTRO

Halloc is a high-throughput malloc/free-style dynamic memory allocator for NVidia Kepler GPUs. It is based on using bit arrays to represent free blocks and using a hash function to quickly search for free blocks. This idea, combined with clever slab management and performance tuning, enables a really fast allocator. Halloc achieves more than 1.5 bln. mallocs/s (more than 1 bln. malloc/free pairs/s) on K20X and 16-byte allocations, with tens of thousands of GPU threads and more than 100MiB allocated. This is much higher than other state-of-the-art GPU allocators. In addition, halloc's performance is also more stable. This makes halloc suitable for use in GPGPU applications requiring fast dynamic memory management. Halloc is mainly designed for small allocation sizes, and delegates allocations larger than 3KiB to CUDA allocator.

REQUIREMENTS

Software: CUDA 5.0 or higher (tested with 6.5) Hardware: Compute Capability 2.0 or higher (tested on CC 3.5 devices K20X and K40).

Note: libraries and tests are currently not compiled for compute_50/sm_50, i.e. Maxwell.

COMPILING

To compile halloc library, type (in project's top directory):

make

To run correctness tests (CAUTION: takes a lot of time!):

make test

To build correctness tests without running them:

make build-corr

To build performance tests without running them:

make build-perf

Performance tests are then located in ./tst/perf/bin directory, and can be invoked individually, e.g.

./tst/perf/bin/throughput
./tst/perf/bin/phase-throughput -f0.95 -F0.05 -e0.91 -g5 -t128

To install, edit PREFIX variable in the makefile to your desired install directory (default ~/usr) and type:

make install

To uninstall:

make uninstall

USING HALLOC

See samples/ directory for samples using Halloc.

Compiling Your Program

The GPU application then needs to be compiled with halloc static library using separate device compilation and linking. Assuming that the variable $PREFIX contains the installation prefix, and myprog.cu is the file being compiled, this can be done as follows:

nvcc -arch=sm_35 -O3 -I $(PREFIX)/include -dc myprog.cu -o myprog.o nvcc -arch=sm_35 -O3 -L $(PREFIX)/lib -lhalloc -o myprog myprog.o

Halloc API

The functions defined by halloc are in the halloc.h file, which needs to be included into your code to use halloc:

#include <halloc.h>

Before using halloc, in device functions it has to be initialized with ha_init() function:

void ha_init(halloc_opts_t opts = halloc_opts_t());

It can be given a full halloc_opts_t structure to control fine halloc parameters, such as slab size or fraction of used chunks at which the slab is considered "busy". It can also be called just with specifying amount of memory to allocate, or completely without any parameter list to preserve defaults:

ha_init(512 * 1024 * 1024); // pass memory to allocate ha_init(); // use default amount of memory

Halloc defines two functions, hamalloc to allocate and hafree to free memory (malloc and free are used by CUDA allocator, therefore halloc has to use other names). These functions can only be called from device code.

void *hamalloc(size_t nbytes); void hafree(void *p);

Otherwise, these functions have pretty much the same behavior as standard C malloc/free, e.g.:

// allocate an array int *p = (int *)hamalloc(8 * sizeof(int)); p[0] = 0; p[1] = threadIdx.x; p[2] = 2; // ... // free the array hafree(p);

// allocate a list typedef struct list_ { int element; struct list_ *next; } list; // ... list *l = (list *)hamalloc(sizeof(list)); l->element = 1; l->next = (list *)hamalloc(sizeof(list)); l->next->element = 2; l->next->next = NULL;

The functions can be used in pretty much the same way as in C programs. hamalloc accepts the number of bytes to allocate, and returns the pointer to allocated memory, or NULL if memory cannot be allocated. Similarly, hafree accepts either a pointer returned by hamalloc or NULL, and frees the memory previously allocated. Naturally, hamalloc and hafree are thread-safe, and can be called simultaneously by threads of the same or different kernels. hamalloc allocations persist across kernel invocations, and can be used in other kernel calls. Pointers allocated by hamalloc can only be freed by hafree; they cannot be deallocated, e.g., by host/device cudaFree/free.

ha_shutdown() is intended to free resources used by halloc, but is currently a no-op.

LIMITATIONS

There is currently no way to change parameters or allocate more memory after halloc has been initialized.

BUGS

Though correctness tests pass successfully, this provies nothing, of course. Some bugs are most likely there ;)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 89

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗