All Projects → gadiluna → Safe

gadiluna / Safe

Licence: other
SAFE: Self-Attentive Function Embeddings for binary similarity

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Safe

Tooth Detection
🦷 Detection of restorations and treatments on dental x-rays in Tensorflow, using Faster-RCNN
Stars: ✭ 99 (-11.61%)
Mutual labels:  neural-networks
Chatbot
Русскоязычный чатбот
Stars: ✭ 106 (-5.36%)
Mutual labels:  neural-networks
Faceswap
Deepfakes Software For All
Stars: ✭ 39,911 (+35534.82%)
Mutual labels:  neural-networks
Smrt
Handle class imbalance intelligently by using variational auto-encoders to generate synthetic observations of your minority class.
Stars: ✭ 102 (-8.93%)
Mutual labels:  neural-networks
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-8.04%)
Mutual labels:  neural-networks
Render
Go package for easily rendering JSON, XML, binary data, and HTML templates responses.
Stars: ✭ 1,562 (+1294.64%)
Mutual labels:  binary
Micro Racing
🚗 🏎️ 🎮 online 3D multiplayer neural networks based racing game
Stars: ✭ 100 (-10.71%)
Mutual labels:  neural-networks
Elephas
Distributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+1258.04%)
Mutual labels:  neural-networks
Torchmd
End-To-End Molecular Dynamics (MD) Engine using PyTorch
Stars: ✭ 99 (-11.61%)
Mutual labels:  neural-networks
Fast Style Transfer
TensorFlow CNN for fast style transfer ⚡🖥🎨🖼
Stars: ✭ 10,240 (+9042.86%)
Mutual labels:  neural-networks
Sigmoidal ai
Tutoriais de Python, Data Science, Machine Learning e Deep Learning - Sigmoidal
Stars: ✭ 103 (-8.04%)
Mutual labels:  neural-networks
Bingrep
like ~~grep~~ UBER, but for binaries
Stars: ✭ 1,395 (+1145.54%)
Mutual labels:  binary
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+9810.71%)
Mutual labels:  neural-networks
Semanticsegpapercollection
Stars: ✭ 102 (-8.93%)
Mutual labels:  neural-networks
Adcme.jl
Automatic Differentiation Library for Computational and Mathematical Engineering
Stars: ✭ 106 (-5.36%)
Mutual labels:  neural-networks
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+1130.36%)
Mutual labels:  neural-networks
Ssd Pytorch
SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity
Stars: ✭ 107 (-4.46%)
Mutual labels:  neural-networks
Bepasty Server
binary pastebin server
Stars: ✭ 111 (-0.89%)
Mutual labels:  binary
Deep architect
A general, modular, and programmable architecture search framework
Stars: ✭ 110 (-1.79%)
Mutual labels:  neural-networks
Nexe
🎉 create a single executable out of your node.js apps
Stars: ✭ 10,565 (+9333.04%)
Mutual labels:  binary

SAFE : Self Attentive Function Embedding

Paper

This software is the outcome of our accademic research. See our arXiv paper: arxiv

If you use this code, please cite our accademic paper as:

@inproceedings{massarelli2018safe,
  title={SAFE: Self-Attentive Function Embeddings for Binary Similarity},
  author={Massarelli, Luca and Di Luna, Giuseppe Antonio and Petroni, Fabio and Querzoni, Leonardo and Baldoni, Roberto},
  booktitle={Proceedings of 16th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA)},
  year={2019}
}

What you need

You need radare2 installed in your system.

Quickstart

To create the embedding of a function:

git clone https://github.com/gadiluna/SAFE.git
pip install -r requirements
chmod +x download_model.sh
./download_model.sh
python safe.py -m data/safe.pb -i helloworld.o -a 100000F30

What to do with an embedding?

Once you have two embeddings embedding_x and embedding_y you can compute the similarity of the corresponding functions as:

from sklearn.metrics.pairwise import cosine_similarity

sim=cosine_similarity(embedding_x, embedding_y)
 

Data Needed

SAFE needs few information to work. Two are essentials, a model that tells safe how to convert assembly instructions in vectors (i2v model) and a model that tells safe how to convert an binary function into a vector. Both models can be downloaded by using the command

./download_model.sh

the downloader downloads the model and place them in the directory data. The directory tree after the download should be.

safe/-- githubcode
     \
      \--data/-----safe.pb
               \
                \---i2v/
            

The safe.pb file contains the safe-model used to convert binary function to vectors. The i2v folder contains the i2v model.

Hardcore Details

This section contains details that are needed to replicate our experiments, if you are an user of safe you can skip it.

Safe.pb

This is the freezed tensorflow trained model for AMD64 architecture. You can import it in your project using:

 import tensorflow as tf
 
 with tf.gfile.GFile("safe.pb", "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

 with tf.Graph().as_default() as graph:
    tf.import_graph_def(graph_def)
    
 sess = tf.Session(graph=graph)

see file: neural_network/SAFEEmbedder.py

i2v

The i2v folder contains two files. A Matrix where each row is the embedding of an asm instruction. A json file that contains a dictonary mapping asm instructions into row numbers of the matrix above. see file: asm_embedding/InstructionsConverter.py

Train the model

If you want to train the model using our datasets you have to first use:

 python3 downloader.py -td

This will download the datasets into data folder. Note that the datasets are compressed so you have to decompress them yourself. This data will be an sqlite databases. To start the train use neural_network/train.sh. The db can be selected by changing the parameter into train.sh. If you want information on the dataset see our paper.

Create your own dataset

If you want to create your own dataset you can use the script ExperimentUtil into the folder dataset creation.

Create a functions knowledge base

If you want to use SAFE binary code search engine you can use the script ExperimentUtil to create the knowledge base. Then you can search through it using the script into function_search

Related Projects

Thanks

In our code we use godown to download data from Google drive. We thank circulosmeos, the creator of godown.

We thank Davide Italiano for the useful discussions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].