SabanciParallelComputing / GOSH

Licence: MIT License
An ultra-fast, GPU-based large graph embedding algorithm utilizing a novel coarsening algorithm requiring not more than a single GPU.

Programming Languages

Cuda
1817 projects
C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to GOSH

rbcuda
CUDA bindings for Ruby
Stars: ✭ 57 (+375%)
Mutual labels:  cuda, high-performance-computing, gpu-computing
Claymore
Stars: ✭ 135 (+1025%)
Mutual labels:  cuda, high-performance-computing, gpu-computing
Neanderthal
Fast Clojure Matrix Library
Stars: ✭ 927 (+7625%)
Mutual labels:  cuda, high-performance-computing, gpu-computing
Bayadera
High-performance Bayesian Data Analysis on the GPU in Clojure
Stars: ✭ 342 (+2750%)
Mutual labels:  cuda, high-performance-computing, gpu-computing
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+6508.33%)
Mutual labels:  cuda, high-performance-computing, gpu-computing
Hipsycl
Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (+3041.67%)
Mutual labels:  cuda, high-performance-computing, gpu-computing
Sixtyfour
How fast can we brute force a 64-bit comparison?
Stars: ✭ 41 (+241.67%)
Mutual labels:  cuda, gpu-computing
Heteroflow
Concurrent CPU-GPU Programming using Task Models
Stars: ✭ 57 (+375%)
Mutual labels:  cuda, gpu-computing
Autodock Gpu
AutoDock for GPUs and other accelerators
Stars: ✭ 65 (+441.67%)
Mutual labels:  cuda, gpu-computing
Deepnet
Deep.Net machine learning framework for F#
Stars: ✭ 99 (+725%)
Mutual labels:  cuda, gpu-computing
Dace
DaCe - Data Centric Parallel Programming
Stars: ✭ 106 (+783.33%)
Mutual labels:  cuda, high-performance-computing
Accelerate Llvm
LLVM backend for Accelerate
Stars: ✭ 134 (+1016.67%)
Mutual labels:  cuda, gpu-computing
Nvidia libs test
Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (+200%)
Mutual labels:  cuda, gpu-computing
Pycuda
CUDA integration for Python, plus shiny features
Stars: ✭ 1,112 (+9166.67%)
Mutual labels:  cuda, gpu-computing
Relion
Image-processing software for cryo-electron microscopy
Stars: ✭ 219 (+1725%)
Mutual labels:  cuda, high-performance-computing
Accelerate
Embedded language for high-performance array computations
Stars: ✭ 751 (+6158.33%)
Mutual labels:  cuda, gpu-computing
Clojurecuda
Clojure library for CUDA development
Stars: ✭ 158 (+1216.67%)
Mutual labels:  cuda, gpu-computing
Alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Stars: ✭ 2,936 (+24366.67%)
Mutual labels:  graph-algorithms, graph-embedding
gardenia
GARDENIA: Graph Analytics Repository for Designing Efficient Next-generation Accelerators
Stars: ✭ 22 (+83.33%)
Mutual labels:  graph-algorithms, gpu-computing
Stdgpu
stdgpu: Efficient STL-like Data Structures on the GPU
Stars: ✭ 531 (+4325%)
Mutual labels:  cuda, gpu-computing

GOSH: Embedding Big Graphs on Small Hardware

GOSH is a GPU-based graph embedding tool that takes a graph and produces d-dimensional vector for every node in the graph. The embeddings can then be used for a multitude of machine learning tasks including node classification, link prediction, graph visualization, and anomaly detection.

Embedding pipeline GOSH employs a novel coarsening algorithm (MultiEdgeCollapse) to compress the graph into smaller graphs and embeds the smaller graphs to produce very accurate embeddings very quickly. Besides, it uses a special scheduling algorithm to embed any graph using a single GPU - even if the memory requirement of the graph exceeds that of the GPU.

Getting Started

Requirements

We compiled this program with nvcc using CUDA 10.1and ran it on Ubuntu 4.4.0-159.

Compiling

You can build the executables of the project with a single command. Simply clone the repo, navigate to it and call the make command:

git clone https://github.com/SabanciParallelComputing/GOSH
cd GOSH
make clean && make

This will produce two executables:

execs/gosh.out # this executable takes inputs as flags and options. 
execs/gosh_nargparse.out # this executable takes all the parameters as a sequence of strings and none of the parameters are optional

Executing

The most basic execution of GOSH can be done as follows:

execs/gosh.out --input-graph $string --output-embedding $string --directed $number --epochs $number
  • --input-graph $string: An edge list file with the vertex IDs as zero-indexed integers. i.e, the file should be of the form:
i j
k l
...
a b

Where i, j, k, l, a, b are vertex IDs from the graph. Please note that all vertices will get embeddings - even those without any edges.

  • --output-embedding $string: The file to which the embeddings are to be printed. The embeddings can either be printed as an ASCII formatted text file or a binary file (can be triggered with the --binary-output flag). The output format in ASCII text is as follows:
num_vertices dimension
0 e_1 e_2 e_3 ... e_dimension-1
1 e_1 e_2 e_3 ... e_dimension-1
...
num_vertices-1 e_1 e_2 e_3 ... e_dimension-1

Where num_vertices and dimension are the number of vertices and the embedding dimensionality, respectively, and every line afterward corresponds to a vertex from the graph. A line starts with the ID of the vertex and is followed by its embeddings. All elements within a line are space-separated.

  • --epochs $number: The number of epochs to run on the entirety of the graph. Running an epoch on a graph Gi which has been coarsening i times corresponds to running |Vi| positive samples on that graph.
    • Note: the strategy in which the epochs are distributed across different levels of coarsening can be tuned using the options --epoch-strategy and --smoothing-ratio and are discussed further below.
  • --directed $number: Whether the graph is directed (1), undirected (0), or passed as in a Binary Compressed Sparse Row format (2).

Optional Parameters

Many optional parameters can be used to fine-tune the embedding:

Global parameters

  • -d --dimension $number: an integer with the dimensionality of the embedding.
  • -s --negative_samples $number: number of negative samples used with every positive update.
  • --negative-weight $float: a scaling factor used with negative samples to scale the gradients to be used when updating embeddings during negative updates.
  • --device-id $nummber: the ID of the GPU device to be used during the embedding.
  • --binary-output: whether the outputs are printed in binary format for compactness and ease of processing on memory. The format of the binary output is as follows:
    • The number of vertices as a signed integer
    • The embedding dimension as s signed integer
    • The embeddings of all the vertices printed sequentially as single precision floats in C++.

Sampling parameters

  • --sampling-algorithm $number: the method used to create positive samples during training. Currently, two sampling strategies are implemented:
    • --sampling-algorithm 0: 1-hop neighbor sampling or PPR sampling as described in VERSE. Depending on the value of --alpha.
      • --alpha 0: positive samples for a node are sampled from its direct neighbors
      • --alpha > 0 && --alpha < 1: positive samples of a node are nodes reached after performing a Personalized Page Rank random walk, with --alpha being the damping factor as defined in VERSE.
    • --sampling-algorithm 1: random-walk based sampling. With this method, random walks are generated on the CPU and samples are extracted from them and sent to the GPU. It is controlled with three parameters:
      • --walk-length $number: length of each random walk.
      • --augmentation-distance $number: within a walk, from each sequence of $number$, all the pairs of nodes are used as poisitive samples.
      • --sample-pool-size: the number of samples to be added to the pool on the CPU which is copied to the GPU.
  • -a --alpha $number: A value for the positive sampling strategy to be used in the model based on VERSE. alpha = 0 will use an adjacency similarity positive sampling approach while 0 > alpha > 100 will use PPR with alpha/100 as its damping factor.

Learning Rate Parameters

  • -l --learning_rate $float: The global learning rate of the model.

  • --learning-rate-decay-strategy $num: The strategy used to decay the learning rate during a level and between different levels. There are four strategies (0/1/2/3), their differences are shown below:

    • 0, 1: At every level, learning rate decays linearly from the initial learning rate starting at the first epoch until the last epoch based on the following equation:
    current_learning_rate = (max(1-current_epoch/total_epochs), 1e-4)*initial_learning_rate
    

    where current_epoch and total_epochs are the current and total epochs for the current coarsening level.

    • 2, 3: The learning rate at the end of a level i is the same as it is at the beginning. No decay.
    • 1, 3; initial learning rate for every coarsening level differs based on the following heuristic:
    lr_i = lr;  if  |Vi| < LR_REV_NV
    lr_i = lr/sqrt(|Vi|/ LR_REV_NV); otherwise
    

    Where:
    lr = input learning rate (global)
    lri = initial learning rate at coarsening level i.
    LR_REV_NV = tunable hyperparameter in the src/main.cu

    • 0, 2: initial learning rate at each level is the same as the original learning rate given as input

Coarsening Parameters

  • --no-coarsening: will not apply any coarsening and will run the embedding directly on the original graph.
  • --coarsening-stopping-threshold $num: the number of vertices to stop coarsening at. i.e, when a graph Gi is generated having |Vi|< $num, it will be added to the coarsened set, but the coarsening will not continue.
  • --coarsening-stopping-precision $float: the accpetable shrinkage of a graph during coarsening. i.e, if graph Gi is coarsened into graph Gi+1, and |Vi+1| > |Vi| * $float, graph Gi+1 is not added to the coarsened set and coarsening will not continue.
  • --coarsening-matching-threshold-ratio $num: controls the total number of matches allowed per vertex. Given a graph Gi coarsened i times, a vertex in Gi is not allowed to match more than i * i * ($num / |Vi|) vertices.
  • --coarsening-min-vertices-in-graph $num: the minimum number of vertices acceptable in a graph to be added into the coarsened set, i.e, if a graph Gi is coarsened into a graph Gi+1 and |Vi+1| < $num, the graph Gi+1 is not added to the coarsened set.

Epoch Distribution Parameters

  • --epoch-strategy: choose the strategy to use to distribute epochs across levels, there are multiple strategies available:
    • fast: The smallest graph is given half of the total epochs and the next level is given half of that, and so on.
    • s-fast: COARSE_SMOOTHING_RATIO * total_epochs epochs are distributed equally across levels, while the remainder is distributed based on the fast rule.
    • normal: equal distribution of epochs across levels.
    • accurate: The opposite of fast; the biggest graph is given half of the total epochs and, the smaller level is given half of that, and so on.
    • s-accurate: COARSE_SMOOTHING_RATIO * total_epochs epochs are distributed equally across levels, while the remainder is distributed based on the accurate rule.

NOTE: in all of the aforementioned rules, a level is allocated a minimum of 1 epoch.

  • --smoothing-ratio: the smoothing ratio used in distributing epochs for the smooth strategies, i.e s-fast and s-accurate.

Large Graph Parameters

  • --epoch-batch-size $num: the number of epochs to run per large graph execution round, where a single round consists of a full rotation over all the embedding part pairs.
  • --num-parts $num: the number of embedding parts to store concurrently on the GPU.
  • --num-pools $num: the number of sample pools to store concurrently on the GPU.
  • --sampling-threads $num: the number of threads to work on sampling in parallel.
  • --concurrent-samplers $num: the number of sample pools to be sampled into concurrently, where a single pool can be sampled into by a maximum of sampling-threads / concurrent-samplers threads.
  • --task-queue-threads $num: the number of threads to execute tasks from the task queue.
  • --num-sample-pool-sets $num: number of sample pool sets.

Authors

Amro Alabsi Aljundi, Taha Atahan Akyildiz, and Kamer Kaya.

Acknowldegments

Citation

If you find our code useful for your research, please cite us:

@ARTICLE{9623416,
  author = {Alabsi Aljundi, Amro and Akyildiz, Taha Atahan and Kaya, Kamer},
  journal = {IEEE Transactions on Parallel and Distributed Systems}, 
  title = {Boosting Graph Embedding on a Single GPU}, 
  year = {2021},
  volume = {},
  number = {},
  pages = {1-1},
  doi = {10.1109/TPDS.2021.3129617}
}
@INPROCEEDINGS{9377898,
  author={Akyildiz, Taha Atahan and Alabsi Aljundi, Amro and Kaya, Kamer},
  booktitle={2020 IEEE International Conference on Big Data (Big Data)}, 
  title={Understanding Coarsening for Embedding Large-Scale Graphs}, 
  year={2020},
  volume={},
  number={},
  pages={2937-2946},
  doi={10.1109/BigData50022.2020.9377898}
}

@INPROCEEDINGS{10.1145/3404397.3404456,
    author = {Akyildiz, Taha Atahan and Aljundi, Amro Alabsi and Kaya, Kamer},
    title = {GOSH: Embedding Big Graphs on Small Hardware},
    year = {2020},
    isbn = {9781450388160},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3404397.3404456},
    doi = {10.1145/3404397.3404456},
    booktitle = {49th International Conference on Parallel Processing - ICPP},
    articleno = {4},
    numpages = {11},
    keywords = {GPU, parallel graph algorithms, link prediction, Graph embedding, graph coarsening},
    location = {Edmonton, AB, Canada},
    series = {ICPP '20}
    note = {Nominated for the best-paper award.}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].