Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ChrisCummins → Programl

ChrisCummins / Programl

Licence: other

Graph-based Program Representation & Models for Deep Learning

Programming Languages

python

139335 projects - #7 most used programming language

Labels

machine-learning llvm data-flow

Projects that are alternatives of or similar to Programl

Likely

A compiler intermediate representation for image recognition and heterogeneous computing.

Stars: ✭ 81 (-20.59%)

Mutual labels: llvm

Sea Dsa

A new context, field, and array-sensitive heap analysis for LLVM bitcode based on DSA.

Stars: ✭ 90 (-11.76%)

Mutual labels: llvm

Numba Scipy

numba_scipy extends Numba to make it aware of SciPy

Stars: ✭ 98 (-3.92%)

Mutual labels: llvm

Malc

Mal (Make A Lisp) compiler

Stars: ✭ 85 (-16.67%)

Mutual labels: llvm

Libcxx Pretty Printers

GDB Pretty Printers for libc++ of Clang/LLVM

Stars: ✭ 89 (-12.75%)

Mutual labels: llvm

Llvm Sys.rs

Rust bindings to LLVM. (Mirror of https://gitlab.com/taricorp/llvm-sys.rs/)

Stars: ✭ 93 (-8.82%)

Mutual labels: llvm

Meta Clang

Clang C/C++ cross compiler and runtime for OpenEmbedded/Yocto Project

Stars: ✭ 76 (-25.49%)

Mutual labels: llvm

Tre

LLVM backed progamming language (Go subset)

Stars: ✭ 100 (-1.96%)

Mutual labels: llvm

Enzyme.jl

Julia bindings for the Enzyme automatic differentiator

Stars: ✭ 90 (-11.76%)

Mutual labels: llvm

Connective

agent-based reactive programming library for typescript

Stars: ✭ 98 (-3.92%)

Mutual labels: data-flow

Goflow

Flow-based and dataflow programming library for Go (golang)

Stars: ✭ 1,276 (+1150.98%)

Mutual labels: data-flow

Fluxor

Unidirectional Data Flow in Swift 🚀 based on Combine 🚜

Stars: ✭ 87 (-14.71%)

Mutual labels: data-flow

Termux Ndk

android-ndk for termux

Stars: ✭ 91 (-10.78%)

Mutual labels: llvm

Llvm Vs2017 Integration

MSBuild 15.0 Toolset integration for multiple LLVM (From v5 to v8)

Stars: ✭ 84 (-17.65%)

Mutual labels: llvm

Faust

Functional programming language for signal processing and sound synthesis

Stars: ✭ 1,360 (+1233.33%)

Mutual labels: llvm

Codechecker

CodeChecker is an analyzer tooling, defect database and viewer extension for the Clang Static Analyzer and Clang Tidy

Stars: ✭ 1,209 (+1085.29%)

Mutual labels: llvm

Cleanarchitecture

Android Kotlin Clean Architecture

Stars: ✭ 94 (-7.84%)

Mutual labels: data-flow

Fanx

A portable programming language

Stars: ✭ 101 (-0.98%)

Mutual labels: llvm

Savior Source

source code for savior fuzzer

Stars: ✭ 100 (-1.96%)

Mutual labels: llvm

Xchain

A cross compiler toolchain targeting macOS/iOS/etc.

Stars: ✭ 95 (-6.86%)

Mutual labels: llvm

View All Similar Projects ➔

ProGraML: Program Graphs for Machine Learning

License
OS	GNU/Linux, macOS ≥ 10.15
Python Versions	3.6, 3.7, 3.8
development Branch
stable Branch
Development Activity

Overview
Getting Started
Installation
Constructing the ProGraML Representation
Usage
- End-to-end C++ flow
- Dataflow experiments
Contributing
Acknowledgements

Overview

ProGraML is a representation for programs as input to a machine learning model.

Key features are:

Expressiveness: We represent programs as graphs, capturing all of the control, data, and call relations. Each node in the graph represents an instruction, variable, or constant, and edges are positional such that non-commutative operations can be differentiated.
Portability: ProGraML is derived from compiler IRs, making it independent of the source language (e.g. we have trained models to reason across five different source languages at a time). It is easy to target new IRs (we currently support LLVM and XLA).
Extensibility: Features and labels can easily be added at the whole-program level, per-instruction level, or for individual relations.

Getting Started

To get stuck in and play around with our graph representation, visit:

Or if papers are more your ☕, have a read of ours:

Installation

Command-line tools

Download the latest macOS or Linux release archive from the releases page.
Unpack the release archive to ~/.local/opt/programl (or a directory of your choice) using:

mkdir -p ~/.local/opt/programl
tar xjvf ~/Downloads/programl-*.tar.bz2 -C ~/.local/opt/programl

Add the installed files to your paths. You may want to add this to your ~/.bashrc:

export PATH=$HOME/.local/opt/programl/bin:$PATH
export LD_LIBRARY_PATH=$HOME/.local/opt/programl/lib:$LD_LIBRARY_PATH

Building from source

Requirements:

macOS ≥ 10.15 or GNU / Linux (we recommend Ubuntu Linux ≥ 18.04).
bazel ≥ 2.0 (we recommend using bazelisk to automatically download and use the correct bazel version).
Python ≥ 3.6

Install the python dependencies using:

$ python -m pip install -r requirements.txt

Once you have the above requirements installed, test that everything is working by building and running full test suite:

$ bazel test //...

Build and install the command line tools to ~/.local (or a directory of your choice) using:

$ bazel run -c opt //:install -- ~/.local

Then to use them, append the following to your ~/.bashrc:

export PATH=~/.local/opt/programl/bin:$PATH
export LD_LIBRARY_PATH=~/.local/opt/programl/lib:$LD_LIBRARY_PATH

Datasets

Please see this doc for download links for our publicly available datasets of LLVM-IRs, ProGraML graphs, and data flow analysis labels.

Using this project as a dependency

If you are using bazel you can add ProGraML as an external dependency. Add to your WORKSPACE file:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name="programl",
    strip_prefix="ProGraML-<stable-commit>",
    urls=["https://github.com/ChrisCummins/ProGraML/archive/<stable-commit>.tar.gz"],
)

# ----------------- Begin ProGraML dependencies -----------------
<WORKSPACE dependencies>
# ----------------- End ProGraML dependencies -----------------

Where <WORKSPACE dependencies> is the block of delimited code in @programl//:WORKSPACE (this is an unfortunately clumsy workaround for recursive workspaces).

Then in your BUILD file:

cc_library(
    name = "mylib",
    srcs = ["mylib.cc"],
    deps = [
        "@programl//programl/ir/llvm",
    ],
)

py_binary(
    name = "myscript",
    srcs = ["myscript.py"],
    deps = [
        "@programl//programl/ir/llvm/py:llvm",
    ],
)

Constructing the ProGraML Representation

The ProGraML representation is constructed in multiple stages. Here we describe the process for a simple recursive Fibonacci implementation in C. For instructions on how to run this process, see Usage below.

Step 1: Compiler IR

We start by lowering the program to a compiler IR. In this case, we'll use LLVM-IR. This can be done using: clang -emit-llvm -S -O3 fib.c.

Step 2: Control-flow

We begin building a graph by constructing a full-flow graph of the program. In a full-flow graph, every instruction is a node and the edges are control-flow. Note that edges are positional so that we can differentiate the branching control flow in that switch instruction.

Step 3: Data-flow

Then we add a graph node for every variable and constant. In the drawing above, the diamonds are constants and the variables are ovals. We add data-flow edges to describe the relations between constants and the instructions that use them, and variables and the constants which define/use them. Like control edges, data edges have positions. In the case of data edges, the position encodes the order of a data element in the list of instruction operands.

Step 4: Call graph

Finally, we add call edges (green) from callsites to the function entry instruction, and return edges from function exits to the callsite. Since this is a graph of a recursive function, the callsites refer back to the entry of the function (the switch). The external node is used to represent a call from an external site.

The process described above can be run locally using our clang2graph and graph2dot tools: clang clang2graph -O3 fib.c | graph2dot

Usage

End-to-end C++ flow

In the manner of Unix Zen, creating and manipulating ProGraML graphs is done using command-line tools which act as filters, reading in graphs from stdin and emitting graphs to stdout. The structure for graphs is described through a series of protocol buffers.

This section provides an example step-by-step guide for generating a program graph for a C++ application.

Install LLVM-10 and the ProGraML command line tools.
Compile your C++ code to LLVM-IR. The way to do this to modify your build system so that clang is passed the -emit-llvm -S flags. For a single-source application, the command line invocation would be:

$ clang-10 -emit-llvm -S -c my_app.cpp -o my_app.ll

For a multi-source application, you can compile each file to LLVM-IR separately and then link the results. For example:

$ clang-10 -emit-llvm -S -c foo.cpp -o foo.ll
$ clang-10 -emit-llvm -S -c bar.cpp -o bar.ll
$ llvm-link foo.ll bar.ll -S -o my_app.ll

Generate a ProGraML graph protocol buffer from the LLVM-IR using the llvm2graph commnand:

$ llvm2graph < my_app.ll > my_app.pbtxt

The generated file my_app.pbtxt uses a human-readable ProgramGraph format which you can inspect using a text editor. In this case, we will render it to an image file using Graphviz.

Generate a Graphviz dotfile from the ProGraML graph using graph2dot:

$ graph2dot < my_app.pbtxt > my_app.dot

Render the dotfile to a PNG image using Graphviz:

$ dot -Tpng my_app.dot -o my_app.png

Dataflow experiments

Follow the instructions for building from source
Download and unpack our dataflow dataset
Train and evaluate a graph neural network model using:

bazel run -c opt //tasks/dataflow:train_ggnn -- \
    --analysis reachability \
    --path=$HOME/programl

where --analysis is the name of the analysis you want to evaluate, and --path is the root of the unpacked dataset. There are a lot of options that you can use to control the behavior of the experiment, see --helpfull for a full list. Some useful ones include:

--batch_size controls the number of nodes in each batch of graphs.
--layer_timesteps defines the layers of the GGNN model, and the number of timesteps used for each.
--learning_rate sets the initial learning rate of the optimizer.
--lr_decay_rate the rate at which learning rate decays.
--lr_decay_steps number of gradient steps until the lr is decayed.
--train_graph_counts lists the number of graphs to train on between runs of the validation set.

🏗️ Under construction We are in the process of refactoring the dataflow experiments with a revamped API. There are currently bugs in the data loader which may affect training jobs, see #147.

Contributing

Patches, bug reports, feature requests are welcome! Please use the issue tracker to file a bug report or question. If you would like to help out with the code, please read this document.

Acknowledgements

Made with ❤️️ by Chris Cummins and Zach Fisches, with help from folks at the University of Edinburgh and ETH Zurich: Tal Ben-Nun, Torsten Hoefler, Hugh Leather, and Michael O'Boyle.

Funding sources: HiPEAC Travel Grant.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 102

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (25) 🔗