All Projects → IITH-Compilers → IR2Vec

IITH-Compilers / IR2Vec

Licence: other
Implementation of IR2Vec, published in ACM TACO

Programming Languages

LLVM
166 projects
Jupyter Notebook
11667 projects
C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to IR2Vec

Keras-Application-Zoo
Reference implementations of popular DL models missing from keras-applications & keras-contrib
Stars: ✭ 31 (+10.71%)
Mutual labels:  embeddings
jitmap
LLVM-jitted bitmaps
Stars: ✭ 25 (-10.71%)
Mutual labels:  llvm
glottie
OpenGL/WebGL based Lottie animation player
Stars: ✭ 60 (+114.29%)
Mutual labels:  llvm
whatlies
Toolkit to help understand "what lies" in word embeddings. Also benchmarking!
Stars: ✭ 351 (+1153.57%)
Mutual labels:  embeddings
chimera
🐍 A CLI tool for generating Boost.Python/pybind11 bindings from C/C++
Stars: ✭ 12 (-57.14%)
Mutual labels:  llvm
lsif-clang
Language Server Indexing Format (LSIF) generator for C, C++ and Objective C
Stars: ✭ 28 (+0%)
Mutual labels:  llvm
m2lang
The LLVM-based Modula-2 compiler
Stars: ✭ 29 (+3.57%)
Mutual labels:  llvm
malgo
A statically typed functional programming language.
Stars: ✭ 37 (+32.14%)
Mutual labels:  llvm
clam
Static Analyzer for LLVM bitcode based on Abstract Interpretation
Stars: ✭ 180 (+542.86%)
Mutual labels:  llvm
dmjit
.dmJIT is a Rust-based JIT compiler using modified auxtools, dmasm and Inkwell LLVM wrapper for boosting Byond DM performance without any hassle! (formerly known as dm-jitaux)
Stars: ✭ 18 (-35.71%)
Mutual labels:  llvm
TinyCompiler
c compiler based on flex(lex), bison(yacc) and LLVM, supports LLVM IR and obj code generation. 基于flex,bison以及LLVM,使用c++11实现的类C语法编译器, 支持生成中间代码及可执行文件.
Stars: ✭ 162 (+478.57%)
Mutual labels:  llvm
sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (+82.14%)
Mutual labels:  embeddings
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (+14.29%)
Mutual labels:  embeddings
code-compass
a contextual search engine for software packages built on import2vec embeddings (https://www.code-compass.com)
Stars: ✭ 33 (+17.86%)
Mutual labels:  embeddings
wasm-toolchain
WebAssembly toolchain
Stars: ✭ 34 (+21.43%)
Mutual labels:  llvm
jingle
🔔 Jingle is a dynamically-typed, multi-paradigm programming language designed for humans and machines.
Stars: ✭ 34 (+21.43%)
Mutual labels:  llvm
stack-guard
A toy implementation of 'Stack Guard' on top of the LLVM compiler toolchain
Stars: ✭ 21 (-25%)
Mutual labels:  llvm
hmg
💝 My personal Gentoo/Linux configuration backup files
Stars: ✭ 16 (-42.86%)
Mutual labels:  llvm
ugo-compiler-book
📚 µGo语言实现(从头开发一个迷你Go语言编译器)[Go版本+Rust版本]
Stars: ✭ 996 (+3457.14%)
Mutual labels:  llvm
EmbeddedScrollView
Embedded UIScrollView for iOS.
Stars: ✭ 55 (+96.43%)
Mutual labels:  embeddings

IR2Vec

IR2Vec is a LLVM IR based framework to generate distributed representations for the source code in an unsupervised manner, which can be used to represent programs as input to solve machine learning tasks that take programs as inputs.

This repo contains the source code and relevant information described in the paper (arXiv). Please see here for more details.

IR2Vec: LLVM IR Based Scalable Program Embeddings, S. VenkataKeerthy, Rohit Aggarwal, Shalini Jain, Maunendra Sankar Desarkar, Ramakrishna Upadrasta, and Y. N. Srikant

LLVM Tests Publish pre-commit checks

Image

Table Of Contents

Requirements

(Experiments are done on an Ubuntu 18.04 machine)

Binaries and Libraries - Artifacts

Binaries and Libraries (.a and .so) are autogenerated for every relevant checkin using GitHub Actions. Such generated artifacts are tagged along with the successful runs of Publish workflow and can be found here.

Building from source

  1. mkdir build && cd build
  2. IR2Vec uses Eigen library. If your system already have Eigen (3.3.7) setup, you can skip this step.
    1. Download and extract the released version.
      • wget https://gitlab.com/libeigen/eigen/-/archive/3.3.7/eigen-3.3.7.tar.gz
      • tar -xvzf eigen-3.3.7.tar.gz
    2. mkdir eigen-build && cd eigen-build
    3. cmake ../eigen-3.3.7 && make
    4. cd ../
  3. cmake -DLT_LLVM_INSTALL_DIR=<path_to_LLVM_build_dir> -DEigen3_DIR=<path_to_eigen_build_dir> [-DCMAKE_INSTALL_PREFIX=<install_dir>] ../src
  4. make [&& make install]

This process would generate ir2vec binary under build/bin directory, libIR2Vec.a and libIR2Vec.so under build/lib directory.

To ensure the correctness, run make verify-all

Generating program representations

IR2Vec can be used either as a stand alone tool using binary, or can be integrated with any third party tools using libraries. Please see below for the usage instructions.

Using Binary

ir2vec -<mode> -vocab <seedEmbedding-file-path> -o <output-file> -level <p|f> -class <class-number> <input-ll-file>

Command-Line options

  • mode - can be one of sym/fa
    • sym denotes Symbolic representation
    • fa denotes Flow-Aware representation
  • vocab - the path to the seed embeddings file
  • o - file in which the embeddings are to be appended; (Note : If file doesn’t exist, new file would be created, else embeddings would be appended)
  • level - can be one of chars p/f.
    • p denotes program level encoding
    • f denotes function level encoding
  • class - only non-mandatory argument. Used for the purpose of mentioning class labels for classification tasks (To be used with the level p). Defaults to -1. When, not equal to -1, the pass prints class-number followed by the corresponding embeddings

Please use --help for further details.

Format of the output embeddings in output_file

  • If the level is p:
<class-number> <Embeddings>

class-number would be printed only if it is not -1

  • If the level is f
<function-name> = <Embeddings>

Flow-Aware Embeddings

  • ir2vec -fa -vocab vocabulary/seedEmbeddingVocab-300-llvm12.txt -o <output_file> -level <p|f> -class <class-number> <input_ll_file>

Symbolic Embeddings

  • ir2vec -sym -vocab vocabulary/seedEmbeddingVocab-300-llvm12.txt -o <output_file> -level <p|f> -class <class-number> <input_ll_file>

Using Libraries

The libraries can be installed by passing the installation location to the CMAKE_INSTALL_PREFIX flag during cmake followed by make install. The interfaces are available in IR2Vec.h. External projects that would like to use IR2Vec can access the functionality using these exposed interfaces on including IR2Vec.h from the installed location after linking statically or dynamically.

  • If the project does not use LLVM, LLVM dependencies have to be linked and included separately.
  • Please ensure that the IR2Vec libraries are compiled with compatible LLVM.
    • If you are getting errors, please recompile IR2Vec by passing the current LLVM install directory path to LT_LLVM_INSTALL_DIR during cmake.

The following template can be used to link IR2vec libraries on a cmake based project.

set(IR2VEC_INSTALL_DIR "" CACHE PATH "IR2Vec installation directory")
include_directories("${IR2VEC_INSTALL_DIR}/include")
target_link_libraries(<your_executable_or_library> PUBLIC ${IR2VEC_INSTALL_DIR}/lib/<libIR2Vec.a or libIR2Vec.so>)

And then pass the location of IR2Vec's install prefix to DIR2VEC_INSTALL_DIR during cmake.

The following example snippet shows how to query the exposed vector representations.

#include "IR2Vec.h"

// Creating object to generate FlowAware representation
auto ir2vec =
      IR2Vec::Embeddings(<LLVM Module>, IR2Vec::IR2VecMode::FlowAware,
                         "./vocabulary/seedEmbeddingVocab-300-llvm12.txt");

// Getting Instruction vectors corresponding to the instructions in <LLVM Module>
auto instVecMap = ir2vec.getInstVecMap();
// Access the generated vectors
for (auto instVec : instVecMap) {
    outs() << "Instruction : ";
    instVec.first->print(outs());
    outs() << ": ";

    for (auto val : instVec.second)
      outs() << val << "\t";
}

// Getting vectors corresponding to the functions in <LLVM Module>
auto funcVecMap = ir2vec.getFunctionVecMap();
// Access the generated vectors
for (auto funcVec : funcVecMap) {
    outs() << "Function : " << funcVec.first->getName() << "\n";
    for (auto val : funcVec.second)
      outs() << val << "\t";
  }

// Getting the program vector
auto pgmVec = ir2vec.getProgramVector();
// Access the generated vector
for (auto val : pgmVec)
    outs() << val << "\t";

Experiments

Note

The results mentioned in the experiment's scripts/the published version are not updated for this branch. The experimental results for this branch would be different when compared to the published version. For comparison, use the release corresponding to v0.1.0.

Citation

@article{VenkataKeerthy-2020-IR2Vec,
author = {VenkataKeerthy, S. and Aggarwal, Rohit and Jain, Shalini and Desarkar, Maunendra Sankar and Upadrasta, Ramakrishna and Srikant, Y. N.},
title = {{IR2Vec: LLVM IR Based Scalable Program Embeddings}},
year = {2020},
issue_date = {December 2020},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {17},
number = {4},
issn = {1544-3566},
url = {https://doi.org/10.1145/3418463},
doi = {10.1145/3418463},
journal = {ACM Trans. Archit. Code Optim.},
month = dec,
articleno = {32},
numpages = {27},
keywords = {heterogeneous systems, representation learning, compiler optimizations, LLVM, intermediate representations}
}

Contributions

Please feel free to raise issues to file a bug, to pose a question, or to initiate any related discussions. Pull requests are welcome :)

License

IR2Vec is released under a BSD 4-Clause License. See the LICENSE file for more details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].