All Projects → eyalroz → libgiddy

eyalroz / libgiddy

Licence: BSD-3-Clause license
Giddy - A lightweight GPU decompression library

Programming Languages

Cuda
1817 projects
C++
36643 projects - #6 most used programming language
CMake
9771 projects
c
50402 projects - #5 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to libgiddy

Firenze
Adapter based JavaScript ORM for Node.js and the browser
Stars: ✭ 131 (+235.9%)
Mutual labels:  databases
Ormar
python async mini orm with fastapi in mind and pydantic validation
Stars: ✭ 155 (+297.44%)
Mutual labels:  databases
Laravel Database Encryption
A package for automatically encrypting and decrypting Eloquent attributes in Laravel 5.5+, based on configuration settings.
Stars: ✭ 238 (+510.26%)
Mutual labels:  databases
Nosqlmap
Automated NoSQL database enumeration and web application exploitation tool.
Stars: ✭ 1,928 (+4843.59%)
Mutual labels:  databases
Atdatabases
TypeScript clients for databases that prevent SQL Injection
Stars: ✭ 154 (+294.87%)
Mutual labels:  databases
Wither
An ODM for MongoDB built on the official MongoDB Rust driver.
Stars: ✭ 174 (+346.15%)
Mutual labels:  databases
Dbplot
Simplifies plotting of database and sparklyr data
Stars: ✭ 117 (+200%)
Mutual labels:  databases
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+530.77%)
Mutual labels:  databases
Rom
Data mapping and persistence toolkit for Ruby
Stars: ✭ 1,959 (+4923.08%)
Mutual labels:  databases
Hera
High Efficiency Reliable Access to data stores
Stars: ✭ 213 (+446.15%)
Mutual labels:  databases
Noproto
Flexible, Fast & Compact Serialization with RPC
Stars: ✭ 138 (+253.85%)
Mutual labels:  databases
.codebits
📚 List of resources for Algorithms and Data Structures in Python & other CS topics @2017
Stars: ✭ 144 (+269.23%)
Mutual labels:  databases
Awesome Sqlalchemy
A curated list of awesome tools for SQLAlchemy
Stars: ✭ 2,316 (+5838.46%)
Mutual labels:  databases
Developer Handbook
An opinionated guide on how to become a professional Web/Mobile App Developer.
Stars: ✭ 1,830 (+4592.31%)
Mutual labels:  databases
Tds fdw
A PostgreSQL foreign data wrapper to connect to TDS databases (Sybase and Microsoft SQL Server)
Stars: ✭ 238 (+510.26%)
Mutual labels:  databases
Vanillacore
The core engine of VanillaDB
Stars: ✭ 119 (+205.13%)
Mutual labels:  databases
Manage Fastapi
🚀 CLI tool for FastAPI. Generating new FastAPI projects & boilerplates made easy.
Stars: ✭ 163 (+317.95%)
Mutual labels:  databases
sqllex
The most pythonic ORM (for SQLite and PostgreSQL). Seriously, try it out!
Stars: ✭ 80 (+105.13%)
Mutual labels:  databases
Prest
PostgreSQL ➕ REST, low-code, simplify and accelerate development, ⚡ instant, realtime, high-performance on any Postgres application, existing or new
Stars: ✭ 3,023 (+7651.28%)
Mutual labels:  databases
Migrate
Database migrations. CLI and Golang library.
Stars: ✭ 2,315 (+5835.9%)
Mutual labels:  databases

Giddy - A GPU lightweight decompression library

The code is now somewhat out-of-date. Contact me regarding an upcoming update

(Originally presented in this mini-paper in the DaMoN 2017 workshop)

Table of contents
- Why lightweight compression for GPU work?
- What does this library comprise?
- Which compression schemes are supported?
- How to decompress data using Giddy?
- Performance
- Acknowledgements

For questions, requests or bug reports - either use the Issues page or email me.

Why lightweight compression for GPU work?

Discrete GPUs are powerful beasts, with numerous cores and high-bandwidth memory, often capable of 10x the throughput in crunching data relative to the maximum achievable on a CPU. Perhaps the main obstacle, however, to utilizing them, is that data usually resides in main system memory, close to the CPU - and to have the GPU process it, we must send it over a PCIe bus. Thus the CPU has a potential of processing in-memory data at (typically in 2017) 30-35 GB/sec, and a discrete GPU at no more than 12 GB/sec.

One way of counteracting this handicap is using compression. The GPU can afford expending more effort in decompressing data arriving over the bus than the CPU; thus if we the data is available apriori in system memory, and is amenable to compression, using it may increase the GPU's effective bandwidth more than it would the CPU's.

Compression schemes come in many shapes and sizes, but it is customary to distinguish "heavy-weight" schemes (such as those based on Lempel-Ziv) from "lightweight" schemes, involving small amounts of computation per element, few accesses to the compressed data for decompressing any single element.

Giddy enables the use of lightweight compressed data on the GPU by providing decompressor implementations for a plethora of compression schemes.

What does the library comprise?

Giddy comprises:

  • CUDA kernel source code for decompressing data using each of the compression schemes listed below. The kernels are templated, and one may instantiate them for a variety of combinations of types and some compression scheme parameters which it would not be efficient to pass at run-time.
    • ... and source code for auxiliary kernels required for decompression (e.g. for the scattering of patch data).
  • A uniform mechanism for configuring launches of these kernels (grid dimensions, block dimensions and dynamic shared memory size).
  • A kernel wrapper abstraction class --- which is not specific to decompression work, but rather general --- and individual kernel wrappers for each decompression scheme (templated similarly to the kernels themselves). Instead of dealing directly with the kernels at the lower level, making CUDA API calls yourself, you can instead use the associated wrapper.
  • The kernel wrapper class also registers itself in a factory, which you can use for instantiate wrappers without having compiled against their code. The factory provides us with instances of a common base class - and their virtual methods are used to pass scheme-specific arguments.

If this sounds a bit confusing, scroll down to the examples section.

Supported compression schemes

The following compression schemes are currently included:

(Note the Wiki pages for each of the schemes are just now being written.)

Additionally, two patching schemes are supported:

  • Naive patching
  • Compressed-Indices patching

As these are "aposteriori" patching schemes, you apply them by simply decompressing using some base scheme, then using one of the two kernels data_layout::scatter or data_layout::compressed_indices_scatter on the initial decompression result. You will not find specific kernels, kernel wrappers or factory entries for the "combined" patched scheme, only for its components.

How to decompress data using Giddy?

Note: The examples use the C++'ish CUDA API wrappers), making the host-side code somewhat clearer and shorter.

Suppose we're presented with compressed data with the following characteristics, which for simplicity is already in GPU memory:

Parameter Value
Decompression scheme Frame of Reference
width of size/index type 32 bits
Uncompressed data type int32_t
type of offsets from FOR value int16_t
segment length (runtime variable)
total length of compressed data (runtime variable)

in other words, we want to implement the following function:

using size_type         = uint32_t; // assuming less than 2^32 elements
using uncompressed_type = int32_t;
using compressed_type   = int16_t;

void decompress_on_device(
	uncompressed_type*              __restrict__  decompressed,
	const compressed_type*          __restrict__  compressed,
	const model_coefficients_type*  __restrict__  segment_model_coefficients,
	size_type                                    length,
	size_type                                    segment_length);

We can do this with Giddy in one of three ways.

Direct use of the kernel source code

The example code for this mode of use is found in examples/src/direct_use_of_kernel.cu.

In this mode, we

  • Include the kernel source file; we now have a pointer to the kernel's device-side function.
  • Include the launch config resolution mechanism header.
  • Instantiate a launch configuration resolution parameters object, with the parameters specific to our launch.
  • Call resolve_launch_configuration() function with the object we instantiated, obtaining a launch_configuration_t struct.
  • Perform a CUDA kernel launch, either using the API wrapper (which takes the device function pointer and a launch_configuration_t) or the plain vanilla way, extracting the fields of the launch_configuration_t.

Instantiation of the specific kernel launch wrapper

The example code for this mode of use is found in examples/src/instantiation_of_wrapper.cu.

Each decompression kernel has a corresponding thin wrapper class. An instance of the wrapper class has no state - no data members; we only use it for its vtable - its virtual methods, specific to the decompression scheme. Thus, in this mode of use, we:

  • Include the kernel's wrapper class definition.
  • Instantiate the wrapper class cuda::kernels::decompression::frame_of_reference::kernel_t
  • Call the wrapper's resolve_launch_configuration() method with the appropriate parameters, obtaining a launch_configuration_t structure.
  • Call the freestanding function cuda::kernel::enqueue_launch() with our wrapper instance, the launch configuration, and the arguments we need to pass the kernel

Use of factory-provided, type-erased wrapper

The example code for this mode of use is found in examples/src/factory_provided_type_erased_wrapper.cu.

The kernel wrappers are intended to allow a uniform interface for launching kernels. This uniformity is achieved by type-erasure: The wrappers' base class virtual methods wrappers' all take a map of boost::any objects; and it is up to the caller to pass the appropriate parameters in that map. Thus, in this mode, we:

  • Include just the common base class header for the kernel wrappers.
  • Use the cuda::registered::kernel_t class' static method produceSubclass() - to instantiate specific the wrapper relevant to our scenario (named "decompression::frame_of_reference::kernel_t<4u, int, short, cuda::functors::unary::parametric_model::constant<4u, int> >"). What we actually hold is an std::unique_ptr() to such an instance.
  • Prepare a type-erased map of parameters, and pass it to the resolve_launch_configuration() method of our isntance, obtaining a launch_configuration_t structure.
  • Prepare a second type-erased map of parameters, and pass it to the enqueue_launch() method of our isntance, along with the launch configuration structure we've just obtained.

No facility for compression!

No code is currently provided for compressing data - neither on the device nor on the host side. This is Issue #3.

Performance

Some of the decompressors are well-optimized, some need more work. The most recent (and only) performance analysis is in the mini-paper mentioned above. Step-by-step instructions for measuring performance (using well-known data sets) are forthcoming.

Acknowledgements

This endevor was made possible with the help of:

  • CWI Amsterdam
  • Prof. Peter Boncz, co-author of the above-mentioned paper
  • The MonetDB DBMS project - which got me into DBMSes and GPUs in the first place, and which I (partially) use for performance testing
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].