validating-binary-decompilation

The project is about validating the correctness of "binary lifters", which translate binary code into an IR format. Specifically, this work focuses on checking the correctness of McSema translating x86-64 binary code into LLVM IR. Given a formal semantics and symbolic execution engine for x86-64 code and LLVM IR, this work uses McSema to translate individual x86-64 instructions into LLVM code and then compares the symbolic summaries obtained using those symbolic execution engines: if these summaries are not logically equivalent, a bug is detected in McSema. After applying this check to x86-64 instructions, this work detected 29 new bugs in McSema, which are all reported and acknowledged (lifting-bits/remill#376).

In addition to the above Single-Instruction Validation (SIV), the work also describes a technique to validate translations of sequences of instructions, called the Program-Level Validation (PLV). First, they use a compositional lifter based on the validated translations to generate LLVM IR, and compare that the McSema lifted IR. The two versions are normalized using multiple standard LLVM passes and then compared by translating each into a graph and checking whether the graphs match.

Documentation
Evaluation w/o Installation
Directory Structure & Tool Information
Checkout Repository
Installation
- Relevant for Single Instruction Validation
- Relevant for Program Level Validation
Evaluation/Testing
- Running SIV Pipeline
- Running PLV Pipeline
Experimental Matcher
- Build Instructions
- Run Instructions

Documentation

A Scalable Validation of Binary Lifters, PLDI 2020 by Sandeep Dasgupta, Sushant Dinesh, Deepan Venkatesh, Vikram S. Adve & Christopher W. Fletcher., accepted at Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ‘20), June 15–20, 2020, London, UK, PDF, BIB

⮭

Evaluation without Installation

Peer Evaluated Artifacts

⮭

Directory Structure

source: Source Code :- The above directory hosts the tools & libraries relevant for validating the decompilation performed by McSema. The important tools are:
- Compositional-Decompiler This is the compositional decompiler used for program level validation. For a given x86-64 program, the tool disassembles, using Objdump, and decompiles, using McSema, each x86-64 instruction and composes the corresponding lifted LLVM IR to create LLVM IR program.
  - Associated Library Path: compositional-decompiler
- Matcher The matcher is responsible for doing "graph isomorphic matching" on the LLVM data-dependence graphs derived from (1) McSema decompiled LLVM IR, and (2) the LLVM IR generated by the above mentioned compositional decompiler.
  - Associated Library Path: llvm-graph-matching
- spec-to-smt An important component of the Single-Instruction-Validation (SIV), which converts the symbolic summary (specified in K-AST) to SMTLIB queries.
  - Associated Library Path:smt-generator
tests/single_instruction_translation_validation: Single Instruction Validation
tests/program_translation_validation: Program Level Validation

⮭

Checkout Repository

expand

# Set REPO_PATH to the parent directory of your 'validating-binary-decompilation' repo;
# Could be different from the one set below
export REPO_PATH=${HOME}/Github/

cd $REPO_PATH/
git clone --recursive https://github.com/sdasgup3/validating-binary-decompilation.git
git clone https://github.com/sdasgup3/compd_cache.git # required only if you wish to
                                                      # evaluate the project.

⮭

Installation

Relevant for Single Instruction Validation

The tools and environments relevant to achieve SIV include (1) the symbolic execution engine to analyze binary program, (2) the symbolic execution engine to analyze LLVM IR program, (3) spec-to-smt to convert symbolic K-AST summary to SMTLIB verification condition, and (4) Enabling python binding for z3 as the verification conditions are encoded in the python z3 binding.

In order to install (1), (2), and (4), the followings need to be done.

Build K

expand

  cd $REPO_PATH
  git clone --recursive [email protected]:sdasgup3/k.git
  cd k
  git checkout bin-decomp-val

  git submodule update --init --recursive
  sudo apt-get install build-essential m4 openjdk-8-jdk libgmp-dev libmpfr-dev pkg-config flex z3 libz3-dev maven opam python3 cmake zlib1g-dev libboost-test-dev libyaml-dev libjemalloc-dev
  curl -sSL https://get.haskellstack.org/ | sh

  sudo apt-get install openjdk-8-jdk
  sudo apt-get install maven
  mvn package -DskipTests   -DskipKTest -Dllvm.backend.skip -DskipDocs -Dhaskell.backend.skip

Build X86 Semantics

expand

  cd $REPO_PATH
  git clone [email protected]:kframework/X86-64-semantics.git
  cd X86-semantics/semantics
  ../scripts/kompile.pl --backend java
  # To validate the working of the executable semantics
  ../scripts/run-single-c-file.sh ../tests/program-tests/bubblesort/test.c java |& tee /tmp/run.log

Build LLVM Semantics

expand

  cd $REPO_PATH
  git clone [email protected]:sdasgup3/llvm-verified-backend.git
  git clone [email protected]:sdasgup3/llvm-verified-backend.git llvm-verified-backend-mem

  cd llvm-verified-backend
  # git checkout working // Takes insane time for krove run
  git checkout single-instruction-val

  ./scripts/kompile-all.sh --llvm
  ./scripts/build-parser.sh

  cd llvm-verified-backend-mem
  git checkout single-instruction-val-mem

  ./scripts/kompile-all.sh --llvm
  ./scripts/build-parser.sh

Python Binding for z3

expand

  sudo apt install python-pip
  sudo pip install setuptools
  git clone [email protected]:Z3Prover/z3.git
  cd z3
  python scripts/mk_make.py --python
  cd build
  make -j 8
  sudo make install

⮭

Relevant for Program Level Validation

The installation goal is to build tools like Compositional-Decompiler, Matcher, and spec-to-smt. Except The Matcher, which is purely a LLVM based tool, the other tools are developed using the libraries of Stoke and LLVM, to be installed as pre-requisites.

Installing Pre-requisites

Install Essentials

expand

sudo apt-get  install \
  cmake \
  z3 \
  parallel

# Set REPO_PATH to the parent directory of your 'validating-binary-decompilation' repo;
# Could be different from the one set below
export REPO_PATH=${HOME}/Github/

cd $REPO_PATH
git clone [email protected]:sdasgup3/scripts-n-docs.git

Install GCC/Clang

expand

[Install gcc (> ver. 6)](https://gist.github.com/zuyu/7d5682a5c75282c596449758d21db5ed)
sudo update-alternatives --config gcc

sudo apt-get install clang-6.0

Install LLVM-4.0.0

expand

  mkdir -p ~/Install/llvm
  cd ~/Install/llvm
  $REPO_PATH/scripts-n-docs/scripts/bash/download-llvm.sh 4.0.0  ./ fast
  rm -rf *.xz
  mkdir llvm.4.0.0.install llvm.4.0.0.obj
  cd llvm.4.0.0.obj
  INSTALL_PREFIX=~/Install/llvm/llvm.4.0.0.install
  # cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++  -DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX} -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE="RelWithDebInfo" -DLLVM_TARGETS_TO_BUILD="host" ../llvm-4.0.0.src/
  cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++  -DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX} -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_BUILD_TYPE="Release" -DLLVM_TARGETS_TO_BUILD="host" ../llvm-4.0.0.src/
  make -j64
  sudo make install
  # Keep the llvm-config path set to the version of llvm we want to use using $PATH
  # Make sure to export PATH like: export PATH=<llvm/install/dir/>:$PATH

  For `signal stack problems`, apply the [patch](https://github.com/sdasgup3/validating-binary-decompilation/tree/master/docs/patches) using
  cd ${HOME}/Install/llvm/llvm-4.0.0.src
  patch -p6 < <patch file>

  Note: the patch is borrowed from
  https://github.com/google/sanitizers/issues/822
  https://github.com/llvm-mirror/compiler-rt/commit/8a5e425a68de4d2c80ff00a97bbcb3722a4716da

Install Stoke

expand

  sudo apt-get install bison ccache cmake doxygen exuberant-ctags flex  g++-multilib  ghc git libantlr3c-dev libboost-dev libboost-filesystem-dev libboost-thread-dev libcln-dev libghc-regex-compat-dev libghc-regex-tdfa-dev libghc-split-dev libjsoncpp-dev python subversion libiml-dev libgmp-dev libboost-regex-dev autoconf libtool antlr pccts pkg-config
  cd $REPO_PATH
  git clone --recursive   [email protected]:sdasgup3/stoke.git stoke-develop
  
  cd  stoke-develop/src/ext/z3
  git checkout master
  git checkout 4192c81fae01e90fd8110a00b14172be818f028b
  
  cd $REPO_PATH/stoke-develop/src/ext/x64asm
  git remote add upstream [email protected]:sdasgup3/x64asm.git
  git fetch upstream
  git checkout working
  
  cd $REPO_PATH/stoke-develop/
  ./configure.sh -d --no-cvc4
  cd $REPO_PATH/stoke-develop/src/ext/x64asm
  make -j64 debug
  
  cd $REPO_PATH/stoke-develop
  mkdir lib
  make -j64 debug

Install mcsema

expand

  sudo apt-get install      git      curl      cmake      python2.7 python-pip \ 
  python-virtualenv      wget      build-essential  gcc-multilib g++-multilib  \
      libtinfo-dev      lsb-release            zlib1g-dev
  sudo dpkg --add-architecture i386
  sudo apt-get install zip zlib1g-dev:i386
  
  git clone  [email protected]:sdasgup3/mcsema.git
  cd mcsema; git checkout  develop; cd -

  git clone https://github.com/sdasgup3/remill.git
  cd remill; git checkout  develop_ae;
  mv ../mcsema tools

  ./scripts/build.sh
  cd remill-build
  sudo make install -j64

Install IDA

Installation of licensed copy

expand

  mkdir -p $REPO_PATH
  cd !$
  git clone https://gitlab.engr.illinois.edu/llvm/IDA.git
  Install Ida_software/*.run
  echo "[email protected]" > ~/.flexlmrc
  sudo apt-get install libc6-i686:i386 libexpat1:i386 libffi6:i386 libfontconfig1:i386 libfreetype6:i386 libgcc1:i386 libglib2.0-0:i386 libice6:i386 libpcre3:i386  libsm6:i386 libstdc++6:i386 libuuid1:i386 libx11-6:i386 libxau6:i386 libxcb1:i386 libxdmcp6:i386 libxext6:i386 libxrender1:i386 zlib1g:i386 libx11-xcb1:i386 libdbus-1-3:i386 libxi6:i386 libsm6:i386 libcurl3:i386
  ~/ida-6.95/idal64
  ## If the ida run cannot find the google protobuf imports, add [patch](https://github.com/sdasgup3/validating-binary-decompilation/blob/master/docs/patches/mcsema_ida_import_protobuf_fix.patch) to mcsema.

Troubleshoot

TVision error: Can not load libcurses.so

ImportError: No module named google.protobuf

expand

  ln -s /usr/local/lib/python2.7/dist-packages/protobuf-3.2.0-py2.7.egg/google /usr/local/lib/python2.7/dist-packages/google

Tools Build Instructions

Checkout Repository

expand

  # Set REPO_PATH to the parent directory of your 'validating-binary-decompilation' repo;
  # Could be different from the one set below
  export REPO_PATH=${HOME}/Github/

  cd $REPO_PATH/
  git clone --recursive https://github.com/sdasgup3/validating-binary-decompilation.git

Install

expand

  mkdir -p ${REPO_PATH}/validating-binary-decompilation/source/build
  cd !$

  cmake .. -DLLVM_ROOT=~/Install/llvm/llvm.4.0.0.install/  -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++  -DCMAKE_BUILD_TYPE="Debug" -DLLVM_ENABLE_ASSERTIONS=ON
  make -j8

  make check-format
  make update-format

Recommended Build Practice (applicable only if there is an update in any stoke repo)

expand

## Update/Build x64asm
  cd $REPO_PATH/stoke-develop/src/ext/x64asm
  git pull upstream working
  make clean; make -j64 debug

  ## Update/Build stoke
  cd $REPO_PATH/stoke-develop
  git pull origin working
  make -j64 debug

  ## Update/Build Matcher/compD
  cd $REPO_PATH/validating-binary-decompilation/source/build
  make clean; make -j64

⮭

Testing

Running SIV Pipeline

An example run

expand

Running SIV on an isolated example instruction addq_r64_r64 involves the following step

# Set REPO_PATH to the parent directory of your 'validating-binary-decompilation' repo;
# Could be different from the one set below
export REPO_PATH=${HOME}/Github/

cd $REPO_PATH/validating-binary-decompilation/tests/single_instruction_translation_validation/mcsema/
echo register-variants/addq_r64_r64 > /tmp/sample.txt
# Or try one of the following
#     echo immediate-variants/addq_r64_imm32 > /tmp/sample.txt        # Try an immediate variant
#     echo memory-variants/addq_r64_m64 > /tmp/sample.txt             # Try a memory variant
#     sort -R docs/AE_docs/non-bugs.txt | head -n 1 > /tmp/sample.txt # Try any random instruction from a list
                                                                      # of passing cases
../../scripts/run_batch_siv.sh /tmp/sample.txt |& tee /tmp/log

# cat ../../scripts/run_batch_siv.sh
## LIST=$1
##
## ## Number of jobs to issue in parallel
## P=$2
## 
## if [ -z "$P" ]; then
##   P=1
## fi
## 
## if [ -z "$REPO_PATH" ]; then
##   REPO_PATH="~/Github/"
## fi
##
## TESTARENA="$REPO_PATH/validating-binary-decompilation/tests/single_instruction_translation_validation/mcsema/" 
## 
## echo
## echo "Cleaning Stale instr semantics definitions"
## cd $REPO_PATH/X86-64-semantics/semantics
## rm -rf underTestInstructions/*
## cd -
## 
## echo
## echo "Collecting instructions from binary"
## cat $LIST | parallel "cd $TESTARENA/{}; make collect; cd -"
## 
## echo
## echo "Compiling the collected X86 semantics to create a sym-ex"
## cd $REPO_PATH/X86-64-semantics/semantics
## ../scripts/kompile.pl --backend java
## cd -
## 
## echo
## echo "Batch Run Begin using $P jobs in parallel"
## 
## cat $LIST | parallel -j $P "echo ; echo {}; echo ======; cd ${TESTARENA}/{}; \
##       echo; echo \"Generating symbolic summary for binary instruction\" \
##       make genxspec; make xprove; \
##       echo; echo \"Generating symbolic summary for lifted LLVM IR\" \
##       make declutter; make kli; make genlspec; make lprove; \
##       echo; echo \"Generating verification conditions\" \
##       make genz3; \
##       echo; echo \"Prove verification conditions\" \
##       make provez3; \
##       cd -"
## 
## echo "Batch Run End"

Note

The Make target xprove will exit with non-zero status and it is a known issue. These errors are because of missing KAST (internal K AST) to SMT translations in the sym-ex engine backend. We do not need the translations as we have a separate tool spec-to-smt to achieve the translation from the summary that the target xprove produces. However, please note that all these error messages can be safely ignored as it does NOT affect the soundness of the summary generation process.
The entire execution will take up-to 4-5 mins.

Batch runs

The goal of all the make targets except make provez3 (as mentioned above) is to reproduce the file verification-query file Output/test-z3.py from scratch. Here is an example test-z3.py file encoding the verification conditions, which is then dispatched (using Make target provez3) to z3 for verification.

The python file has the following format:

expand

 s = z3.solver()
 For each E in {registers, flags, memory values}: s.push()
  s.push()
  lvar = symbolic summary corresponding to E, obtained by sym-execution of the LLVM IR (generated by McSema by lifting the binary instruction)
  xvar = symbolic summary corresponding to E, obtained by sym-execution the binary instruction.

  s.add(lvar != xvar)

  solve s for unsat/sat/timeout

  s.pop()

Upon dispatch of the verification queries, using make provez3, for each register/flag/memory values, the output Test-Pass is generated if all the queries results in unsat. Conversely, the output says Test-Fail if any of query results in sat or timeout.

In order to run this final prover step on a random sample of non-buggy instructions, do the following:

expand

cd $REPO_PATH/validating-binary-decompilation/tests/single_instruction_translation_validation/mcsema/
sort -R docs/AE_docs/non-bugs.txt | head -n 50 | parallel "echo ; echo {}; echo ===; cd {}; make provez3; cd -" |& tee /tmp/log

Misc Information

⮭

Running PLV Pipeline

An Example Run (w/o autotuning)

Here we will elaborate the process of running PLV on an isolated example function Queens/Doit/. We use shell variable NORM to specify which set of optimization passes to use for normalization. For example, the value CUSTOM of NORM enables the use of a set of 17 LLVM opt passes for normalization. Also, there is an option for NORM which enables AutoTuner for pass selection (to be described later).

Running PLV on it involves the following steps

expand

# Set REPO_PATH to the parent directory of your 'validating-binary-decompilation' repo;
# Could be different from the one set below
export REPO_PATH=${HOME}/Github/

export NORM=CUSTOM
cd $REPO_PATH/validating-binary-decompilation/tests/program_translation_validation/single-source-benchmark/Queens/
make binary
make reloc_binary
make mcsema   ## This step is responsible for running Mcsema to lift the binary "binary/test"
              ## and creating  "../binary/test.mcsema.inline.ll"
# One may skip the above steps if the binaries and McSema generated files are pre-populated

cd Doit/

## Running Compositional lifter (Compd) on the corresponding binary ../binary/test.reloc
## Creates mcsema/test.proposed.inline.ll
make compd # One may skip the step if the Compd generated files are pre-populated

## Running the Matcher on the candidate LLVM IR programs after normalization
make match # expect "Match Pass" upon execution

Side Note Behind the scene, the make target match includes other targets like compd_opt & mcsema_opt, responsible for normalization. These targets can be run independently as well. Following shows the details of those targets

## Normalizing mcsema/test.proposed.inline.ll
## Creates mcsema/test.proposed.opt.ll
make compd_opt

## Normalizing ../binary/test.mcsema.inline.ll (Already populated using make mcsema)
## Creates mcsema/test.mcsema.opt.ll
make mcsema_opt

An Example Run (w/ autotuning)

In order to prove that two functions F & F ′ are semantically equivalent, they need to be reduced to isomorphic graphs via normalization. For normalization, we initially used a custom sequence of 17 LLVM optimization passes, discovered manually by pruning the LLVM -O3 search space. Later experimentation on normalizer revealed that (1) changing the order of passes improves the number of functions reducing to isomorphic graphs (the pass ordering problem) and thus reducing false negatives, and (2) not all of the 17 passes are needed for every pair of functions under equivalence check. Such observations intrigues us to frame the problem of selecting the best normalizing pass sequence as an application of program autotuning. We used the OpenTuner framework for the purpose. The OpenTuner framework requires the user to specify a space to search for, which in our case includes the passes from opt -O3 sequence. The framework then uses various machine learning techniques to find the best configuration which can minimize an objective function within a given resource budget.

Next, we will show how the auto-tuner helps discover a pass sequence effective for normalization and subsequent matching.

expand

# Set REPO_PATH to the parent directory of your 'validating-binary-decompilation' repo;
# Could be different from the one set below
export REPO_PATH=${HOME}/Github/

cd $REPO_PATH/validating-binary-decompilation/tests/program_translation_validation/single-source-benchmark/Queens/
make binary
make reloc_binary
make mcsema   ## This step is responsible for running Mcsema to lift the binary "binary/test"
              ## and creating  "../binary/test.mcsema.inline.ll"
# One may skip the above steps if the binaries and McSema generated files are pre-populated

cd Queens/

## Running Compositional lifter (Compd) on the corresponding binary ../binary/test.reloc
## Creates mcsema/test.proposed.inline.ll
make compd # One may skip the step as the Compd generated files are populated

make tuner # Invoke the Opentuner and generate a file mcsema/normalizer_final_config.json
           # containing all the candidate pass sequences which meet the  objective function.
           # runtime: ~1 min.

unset NORM # Select the candidate pass sequences from  mcsema/normalizer_final_config.json file
make match # Invoke the matcher on each of the above candidate pass sequences till the matcher succeeds (declare Pass) or
           # all the candidate pass sequences are exhausted (declare Fail).

Batch run (Recommended)

Running a batch PLV run on toy-examples

w/o autotuner

expand

  cd $REPO_PATH/validating-binary-decompilation/tests/program_translation_validation/toy-examples

  ## Create Binaries from toy-example C functions
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make binary; cd .." |& tee /tmp/log

  ## Create Binaries with relocation information
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make reloc_binary; cd .." |& tee /tmp/log

  ## Create McSema  generated IR programs
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make mcsema; cd .." |& tee /tmp/log

  ## Create Compositional Lifter generated IR programs
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} compd" |& tee docs/compd.log

  ## Following run the matcher after normalization
  export NORM=CUSTOM
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} match" |& tee /tmp/match_1.log

  ## Triage the log file into passing and failing cases
  ../../scripts/triage.sh Pass  /tmp/match_1.log toy-examples &> /tmp/matchPass_1.log
  ../../scripts/triage.sh Fail  /tmp/match_1.log toy-examples &> /tmp/matchFail_1.log

  ## Note: the golden triaged log files can be found at docs/matchPass_1.log and docs/matchFail_1.log

w/ autotuner

expand

  cd $REPO_PATH/validating-binary-decompilation/tests/program_translation_validation/toy-examples

  ## Create Binaries from toy-example C functions
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make binary; cd .." |& tee /tmp/log

  ## Create Binaries with relocation information
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make reloc_binary; cd .." |& tee /tmp/log

  ## Create McSema  generated IR programs
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make mcsema; cd .." |& tee /tmp/log

  ## Create Compositional Lifter generated IR programs
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} compd" |& tee docs/compd.log

  ## Run autotuner to select an effective pass sequence
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} tuner" |& tee docs/tuner.log

  ## Following run the matcher after normalization
  unset NORM
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} match" |& tee /tmp/match_2.log

  ## Triage the log file into passing and failing cases
  ../../scripts/triage.sh Pass  /tmp/match_2.log toy-examples &> /tmp/matchPass_2.log
  ../../scripts/triage.sh Fail  /tmp/match_2.log toy-examples &> /tmp/matchFail_2.log

  ## Note: the golden triaged log files can be found at docs/matchPass_2.log and docs/matchFail_2.log

Running a batch PLV run on single-source-benchmark

w/o autotuner

expand

  cd $REPO_PATH/validating-binary-decompilation/tests/program_translation_validation/single-source-benchmark

  ## Create Binaries from toy-example C functions
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make binary; cd .." |& tee /tmp/log

  ## Create Binaries with relocation information
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make reloc_binary; cd .." |& tee /tmp/log

  ## Create McSema  generated IR programs
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make mcsema; cd .." |& tee /tmp/log

  ## Create Compositional Lifter generated IR programs
  cat docs/reported_stats/1_2_3_4.log | parallel  " echo; echo {}; make -C {} compd" |& tee docs/compd.log

  ## Following run the matcher after normalization
  export NORM=CUSTOM
  cat docs/reported_stats/1_2_3_4.log | parallel  " echo; echo {}; make -C {} match" |& tee /tmp/match_1.log

  ## Triage the log file into passing and failing cases
  ../../scripts/triage.sh Pass  /tmp/match_1.log  single-source-benchmark &> /tmp/matchPass_1.log
  ../../scripts/triage.sh Fail  /tmp/match_1.log  single-source-benchmark &> /tmp/matchFail_1.log

  ## Note: the golden triaged log files can be found at docs/matchPass_1.log and docs/matchFail_1.log

w/ autotuner

expand

  cd $REPO_PATH/validating-binary-decompilation/tests/program_translation_validation/single-source-benchmark

  ## Create Binaries from toy-example C functions
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make binary; cd .." |& tee /tmp/log

  ## Create Binaries with relocation information
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make reloc_binary; cd .." |& tee /tmp/log

  ## Create McSema  generated IR programs
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make mcsema; cd .." |& tee /tmp/log

  ## Create Compositional Lifter generated IR programs
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} compd" |& tee docs/compd.log

  ## Run autotuner to select an effective pass sequence
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} tuner" |& tee docs/tuner.log

  ## Following run the matcher after normalization
  unset NORM
  cat docs/reported_stats/1_2_3_4.log | parallel  " echo; echo {}; make -C {} match" |& tee /tmp/match_2.log

  ../../scripts/triage.sh Pass  /tmp/match_2.log  single-source-benchmark &> /tmp/matchPass_2.log
  ../../scripts/triage.sh Fail  /tmp/match_2.log  single-source-benchmark &> /tmp/matchFail_2.log

  ## Note: the golden triaged log files can be found at docs/matchPass_2.log and docs/matchFail_2.log

Misc Information

⮭

Experimental Matcher

Iterative Pruning based Matcher (IPM): Iteratively prunes the matched sub- graphs and look for more isomorphic matches after transforming (using LLVM optimizations) the residual graph. This enhancement of the matcher is based on the insight that the residual graphs will be much straight- forward, in terms of the aliasing relations among instructions, for the optimization passes to optimize effectively.

Build Instructions

Checkout Repository

expand

  # Set REPO_PATH to the parent directory of your 'validating-binary-decompilation' repo;
  # Could be different from the one set below
  export REPO_PATH=${HOME}/Github/
  cd $REPO_PATH/
  git clone --recursive https://github.com/sdasgup3/validating-binary-decompilation.git

Install

expand

  mkdir -p ${REPO_PATH}/validating-binary-decompilation/source/build
  cd !$

  cmake .. -DLLVM_ROOT=~/Install/llvm/llvm.4.0.0.install/  -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++  -DCMAKE_BUILD_TYPE="Debug" -DLLVM_ENABLE_ASSERTIONS=ON
  make

Run Instructions

expand

  # Set REPO_PATH to the parent directory of your 'validating-binary-decompilation' repo;
  # Could be different from the one set below
  export REPO_PATH=${HOME}/Github/
  cd $REPO_PATH/

  cd $REPO_PATH/validating-binary-decompilation/tests/simantic-similarity-matching/mini-era
  cat docs/funcList.txt | parallel  " echo; echo {}; make -C {} imatch" |& tee docs/imatch.log

[Developers Only] Running a batch PLV run on toy-examples using iterative based pruning

The main use of this run is to test IPM with cases where the "iterative" part of the this strategy is not applicable as all the toy-example cases should reduce to iso-morphic graphs after optimization. The rationale behind the test is to check if the IPM implementation is not breaking the "basic" matching functionality.

expand

  cd $REPO_PATH/validating-binary-decompilation/tests/program_translation_validation/toy-examples

  ## Create Binaries from toy-example C functions
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make binary; cd .." |& tee /tmp/log

  ## Create Binaries with relocation information
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make reloc_binary; cd .." |& tee /tmp/log

  ## Create McSema  generated IR programs
  cat docs/filelist.txt | parallel   " echo; echo {}; cd {}; make mcsema; cd .." |& tee /tmp/log

  ## Create Compositional Lifter generated IR programs
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} compd" |& tee docs/compd.log

  ## Following run the matcher after normalization
  unset NORM
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} imatch" |& tee /tmp/match_3.log

  ## Triage the log file into passing and failing cases
  ../../scripts/triage.sh Pass  /tmp/match_3.log toy-examples
  ../../scripts/triage.sh Fail  /tmp/match_3.log toy-examples

[Developers Only] Codegen for esp backend

The codegen process consists of two phases:

Indentifying functions of mini-era application matching with the target spec of fft and viterbi
Replacing the application code call sites for the identified function with relevant hardware specific APIs assisting code generation.

The code is available as a LLVM module pass. and invoked as

TOOLDIR=${REPO}/validating-binary-decompilation/source/build/bin/
opt -S -load ${TOOLDIR}../lib/LLVMesp_codegen.so -esp_codegen --targetspec <target-spec-file>.ll:<target-func-name> <application>.ll --disable-output

Here are the steps to test on mini-era code

expand

  cd $REPO_PATH/validating-binary-decompilation/tests/simantic-similarity-matching/mini-era
  cat docs/functionList.log | parallel  " echo; echo {}; make -C {} imatch" |& tee /tmp/match_3.log

Misc Information

⮭

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

sdasgup3 / validating-binary-decompilation

Programming Languages

Labels

Projects that are alternatives of or similar to validating-binary-decompilation

validating-binary-decompilation

Table of Contents

Documentation

Evaluation without Installation

Directory Structure

Checkout Repository

Installation

Relevant for Single Instruction Validation

Relevant for Program Level Validation

Installing Pre-requisites

Tools Build Instructions

Testing

Running SIV Pipeline

An example run

Batch runs

Misc Information

Running PLV Pipeline

An Example Run (w/o autotuning)

An Example Run (w/ autotuning)

Batch run (Recommended)

Running a batch PLV run on toy-examples

Running a batch PLV run on single-source-benchmark

Misc Information

Experimental Matcher

Build Instructions

Run Instructions

[Developers Only] Running a batch PLV run on toy-examples using iterative based pruning

[Developers Only] Codegen for esp backend

Misc Information