All Projects → antonmks → Nvparse

antonmks / Nvparse

Licence: apache-2.0
Fast, gpu-based CSV parser

Labels

Projects that are alternatives of or similar to Nvparse

Deformable Convolution Pytorch
PyTorch implementation of Deformable Convolution
Stars: ✭ 410 (-23.08%)
Mutual labels:  cuda
Caer
High-performance Vision library in Python. Scale your research, not boilerplate.
Stars: ✭ 452 (-15.2%)
Mutual labels:  cuda
Convnet
A GPU implementation of Convolutional Neural Nets in C++
Stars: ✭ 506 (-5.07%)
Mutual labels:  cuda
Icpcuda
Super fast implementation of ICP in CUDA for compute capable devices 3.5 or higher
Stars: ✭ 416 (-21.95%)
Mutual labels:  cuda
Baidu Allreduce
Stars: ✭ 430 (-19.32%)
Mutual labels:  cuda
Tsdf Fusion Python
Python code to fuse multiple RGB-D images into a TSDF voxel volume.
Stars: ✭ 464 (-12.95%)
Mutual labels:  cuda
Ai Lab
All-in-one AI container for rapid prototyping
Stars: ✭ 406 (-23.83%)
Mutual labels:  cuda
Arrayfire Rust
Rust wrapper for ArrayFire
Stars: ✭ 525 (-1.5%)
Mutual labels:  cuda
Open3d
Open3D: A Modern Library for 3D Data Processing
Stars: ✭ 5,860 (+999.44%)
Mutual labels:  cuda
Docker Pytorch
A Docker image for PyTorch
Stars: ✭ 505 (-5.25%)
Mutual labels:  cuda
Accel
(Mirror of GitLab) GPGPU Framework for Rust
Stars: ✭ 420 (-21.2%)
Mutual labels:  cuda
Tsdf Fusion
Fuse multiple depth frames into a TSDF voxel volume.
Stars: ✭ 426 (-20.08%)
Mutual labels:  cuda
Xray Oxygen
🌀 Oxygen Engine 2.0. [Preview] Discord: https://discord.gg/P3aMf66
Stars: ✭ 481 (-9.76%)
Mutual labels:  cuda
H2o4gpu
H2Oai GPU Edition
Stars: ✭ 416 (-21.95%)
Mutual labels:  cuda
Rustacuda
Rusty wrapper for the CUDA Driver API
Stars: ✭ 511 (-4.13%)
Mutual labels:  cuda
Tensorrt tutorial
Stars: ✭ 407 (-23.64%)
Mutual labels:  cuda
Bitcracker
BitCracker is the first open source password cracking tool for memory units encrypted with BitLocker
Stars: ✭ 463 (-13.13%)
Mutual labels:  cuda
Stdgpu
stdgpu: Efficient STL-like Data Structures on the GPU
Stars: ✭ 531 (-0.38%)
Mutual labels:  cuda
Depthwiseconvolution
A personal depthwise convolution layer implementation on caffe by liuhao.(only GPU)
Stars: ✭ 512 (-3.94%)
Mutual labels:  cuda
Lightseq
LightSeq: A High Performance Inference Library for Sequence Processing and Generation
Stars: ✭ 501 (-6%)
Mutual labels:  cuda

nvParse

Parsing CSV files with GPU

Parsing delimiter-separated files is a common task in data processing. The regular way of extracting the columns from a text file is to use strtok function :

char * p = strtok(line, "|");
while (p != NULL)
{
    printf ("%s\n",p);
    p = strtok (NULL, "|");
}

However this method of parsing is CPU bound because

  • it doesn't take advantage of multiple cores of modern CPUs.

  • memory bandwidth limitations

This is how the same task can be done using a GPU :

auto break_cnt = thrust::count(d_readbuff.begin(), d_readbuff.end(), '\n');
thrust::device_vector<int> dev_pos(break_cnt);
thrust::copy_if(thrust::make_counting_iterator(0),
                thrust::make_counting_iterator(bytes_read-1),
                d_readbuff.begin(), dev_pos.begin(), _1 == '\n');

The first line counts the number of lines in a buffer (assuming that file is read into memory and copied to gpu buffer d_readbuff). The second line creates a vector in gpu memory that will hold the positions of new line characters. The last line compares the characters in a buffer to new line character and, if a match is found, copies the position of the character to dev_pos vector.

Now that we know the starting positions of every line in a buffer, we can launch a gpu procedure that will parse the lines using several thousands gpu cores :

thrust::counting_iterator<unsigned int> begin(0);
parse_functor ff(...); // examples of call's parameters are in test.cu file
thrust::for_each(begin, begin + break_cnt, ff);

As a result we get the needed columns in separate arrays in gpu memory and can copy them to host memory. Or convert them to binary values using relevant gpu procedures :

gpu_atoll atoll_ff(...);
thrust::for_each(begin, begin + break_cnt, atoll_ff);

Benchmarks !

Hardware : PC with one Intel i3-4130, 16GB of RAM, one 2TB hard drive and GTX Titan

File : 750MB lineitem.tbl text file (6001215 lines)

Parsing 1 field using CPU :

$ time cut -d "|" -f 6 lineitem.tbl > /dev/null

real 0m28.764s

Parsing 11 fields using hand-written program with strtok : (no threads, no memory-mapped file)

14.5s

Parsing 11 fields using GPU :

$ time ./test

0.77s

And the actual gpu parsing part is done in just 0.25 seconds.

P.S. Thanks to Nicolas Guillemot for suggestion on memory-mapping files.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].