All Projects → SoftSec-KAIST → BinKit

SoftSec-KAIST / BinKit

Licence: MIT License
Binary Code Similarity Analysis (BCSA) Benchmark

Programming Languages

shell
77523 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to BinKit

minhash-lsh
Minhash LSH in Golang
Stars: ✭ 20 (-62.96%)
Mutual labels:  benchmark
Turbo-Histogram
Fastest Histogram Construction
Stars: ✭ 44 (-18.52%)
Mutual labels:  benchmark
micro bench
⏰ Dead simple, non intrusive, realtime benchmarks
Stars: ✭ 13 (-75.93%)
Mutual labels:  benchmark
jmeter-grpc-plugin
A JMeter plugin supports load test gRPC
Stars: ✭ 36 (-33.33%)
Mutual labels:  benchmark
clj-perf-tips
Clojure performance tips
Stars: ✭ 14 (-74.07%)
Mutual labels:  benchmark
p3arsec
Parallel Patterns Implementation of PARSEC Benchmark Applications
Stars: ✭ 12 (-77.78%)
Mutual labels:  benchmark
Unchase.FluentPerformanceMeter
🔨 Make the exact performance measurements of the public methods for public classes using this NuGet Package with fluent interface. Requires .Net Standard 2.0+. It is an Open Source project under Apache-2.0 License.
Stars: ✭ 33 (-38.89%)
Mutual labels:  benchmark
best
🏆 Delightful Benchmarking & Performance Testing
Stars: ✭ 73 (+35.19%)
Mutual labels:  benchmark
mqtt-mock
mqtt压测工具。支持subscribe、publish压测方式,支持模拟客户端连接数。
Stars: ✭ 78 (+44.44%)
Mutual labels:  benchmark
instrumentation
Assorted pintools
Stars: ✭ 24 (-55.56%)
Mutual labels:  binary-analysis
go-plugin-benchmark
Benchmark comparing the go plugin package to other plugin implementations
Stars: ✭ 18 (-66.67%)
Mutual labels:  benchmark
binary-decompilation
Extracting high level semantic information from binary code
Stars: ✭ 55 (+1.85%)
Mutual labels:  binary-analysis
python-performance
Performance benchmarks of Python, Numpy, etc. vs. other languages such as Matlab, Julia, Fortran.
Stars: ✭ 24 (-55.56%)
Mutual labels:  benchmark
inspec-gke-cis-benchmark
GKE CIS 1.1.0 Benchmark InSpec Profile
Stars: ✭ 27 (-50%)
Mutual labels:  benchmark
2020a SSH mapping NATL60
A challenge on the mapping of satellite altimeter sea surface height data organised by MEOM@IGE, Ocean-Next and CLS.
Stars: ✭ 17 (-68.52%)
Mutual labels:  benchmark
serializer-benchmark
A PHP benchmark application to compare PHP serializer libraries
Stars: ✭ 14 (-74.07%)
Mutual labels:  benchmark
node-red-contrib-actionflows
Provides a set of nodes to enable an extendable design pattern for flows.
Stars: ✭ 38 (-29.63%)
Mutual labels:  benchmark
KLUE
📖 Korean NLU Benchmark
Stars: ✭ 420 (+677.78%)
Mutual labels:  benchmark
Rel
Binsec/Rel is an extension of Binsec that implements relational symbolic execution for constant-time verification and secret-erasure at binary-level.
Stars: ✭ 27 (-50%)
Mutual labels:  binary-analysis
kubernetes-iperf3
Simple wrapper around iperf3 to measure network bandwidth from all nodes of a Kubernetes cluster
Stars: ✭ 80 (+48.15%)
Mutual labels:  benchmark

Description

BinKit is a binary code similarity analysis (BCSA) benchmark. BinKit provides scripts for building a cross-compiling environment, as well as the compiled dataset. The original dataset includes 1,352 distinct combinations of compiler options of 8 architectures, 5 optimization levels, and 13 compilers. We currently tested this code in Ubuntu 16.04.

For more details, please check our paper.

BCSA tool and Ground Truth Building

For a BCSA tool and ground truth building, please check TikNib.

Pre-compiled dataset and toolchain

You can download our dataset and toolchain as below. The link will be changed to git-lfs soon.

Dataset

Below data is only used for our evaluation.

Toolchain

Currently supported compile options

Architecture

  • x86_32
  • x86_64
  • arm_32 (little endian)
  • arm_64 (little endian)
  • mips_32 (little endian)
  • mips_64 (little endian)
  • mipseb_32 (big endian)
  • mipseb_64 (big endian)

Optimization

  • O0
  • O1
  • O2
  • O3
  • Os

Compilers

  • gcc-4.9.4
  • gcc-5.5.0
  • gcc-6.4.0
  • gcc-7.3.0
  • gcc-8.2.0
  • clang-4.0
  • clang-5.0
  • clang-6.0
  • clang-7.0
  • clang-8.0
  • clang-9.0
  • clang-obfus-fla (Obfuscator-LLVM - FLA)
  • clang-obfus-sub (Obfuscator-LLVM - SUB)
  • clang-obfus-bcf (Obfuscator-LLVM - BCF)
  • clang-obfus-all (Obfuscator-LLVM - FLA + SUB + BCF)

How to use

1. Configure the environment in scripts/env.sh

  • NUM_JOBS: for make, parallel, and python multiprocessing
  • MAX_JOBS: maximum for make

2. Build cross-compiling environment (takes lots of time)

We build crosstool-ng and clang environment. If you download pre-compiled toolchain. Please skip this.

$ source scripts/env.sh
# We may have missed some packages here ... please check
$ scripts/install_default_deps.sh # install default packages for dataset compilation
$ scripts/setup_ctng.sh       # setup crosstool-ng binaries
$ scripts/setup_gcc.sh        # build ct-ng environment. Takes a lot of time
$ scripts/cleanup_ctng.sh     # cleaning up ctng leftovers
$ scripts/setup_clang.sh      # setup clang and llvm-obfuscator

3. Link toolchains

$ scripts/link_toolchains.sh  # link base toolchain

To undo the linking, please check scripts/unlink_toolchains.sh

4. Build dataset

Please configure variables in compile_packages.sh and run below. The script automatically downloads the source code of GNU packages, and compiles them to make all the dataset. However, it may take too much time to create all of them.

  • NOTE that it takes SIGNIFIACNT time.
  • NOTE that some packages would not be compiled for some compiler options.
$ scripts/install_gnu_deps.sh # install default packages for dataset compilation
$ ./compile_packages.sh

4-1. Build dataset (manual)

You can download the source code of GNU packages of your interest as below.

  • Please check step 1 before running the command.
  • You must give ABSOLUTE PATH for --base_dir.
$ source scripts/env
$ python gnu_compile_script.py \
    --base_dir "/home/dongkwan/binkit/dataset/gnu" \
    --num_jobs 8 \
    --whitelist "config/whitelist.txt" \
    --download

You can compile only the packages or compiler options of your interest as below.

$ source scripts/env
$ python gnu_compile_script.py \
    --base_dir "/home/dongkwan/binkit/dataset/gnu" \
    --num_jobs 8 \
    --config "config/normal.yml" \
    --whitelist "config/whitelist.txt"

You can check the compiled binaries as below.

$ source scripts/env
$ python compile_checker.py \
    --base_dir "/home/dongkwan/binkit/dataset/gnu" \
    --num_jobs 8 \
    --config "config/normal.yml"

For more details, please check compile_packages.sh

4-2. Build dataset with customized options

To build datasets by customizing options, you can make your own configuration file (.yml) and select target compiler options. You can check the format in the existing sample files in the /config directory. Here, please make sure that the name of your config file is not included in the blacklist in the compilation script.

Issues

Tested environment

We ran all our experiments on a server equipped with four Intel Xeon E7-8867v4 2.40 GHz CPUs (total 144 cores), 896 GB DDR4 RAM, and 4 TB SSD. We setup Ubuntu 16.04 on the server.

Tested python version

  • Python 3.8.0

Running example

The time spent for running the below script took 7 hours on our machine.

$ python gnu_compile_script.py \
    --base_dir "/home/dongkwan/binkit/dataset/gnu" \
    --num_jobs 72 \
    --config "config/normal.yml" \
    --whitelist "config/whitelist.txt"

Compliation failure

If compilation fails, you may have to adjust the number of jobs for parallel processing in the step 1, which is machine-dependent.

Authors

This project has been conducted by the below authors at KAIST.

Citation

We would appreciate if you consider citing our paper when using BinKit.

@article{kim:2020:binkit,
  author = {Dongkwan Kim and Eunsoo Kim and Sang Kil Cha and Sooel Son and Yongdae Kim},
  title = {Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned},
  eprint={2011.10749},
  archivePrefix={arXiv},
  primaryClass={cs.SE}
  year = {2020},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].