All Projects → Lancern → asm2vec

Lancern / asm2vec

Licence: other
An unofficial implementation of asm2vec as a standalone python package

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to asm2vec

grad-cam-text
Implementation of Grad-CAM for text.
Stars: ✭ 37 (-70.87%)
Mutual labels:  word2vec
binary-auditing-solutions
Learn the fundamentals of Binary Auditing. Know how HLL mapping works, get more inner file understanding than ever.
Stars: ✭ 61 (-51.97%)
Mutual labels:  binary-analysis
skip-gram-Chinese
skip-gram for Chinese word2vec base on tensorflow
Stars: ✭ 20 (-84.25%)
Mutual labels:  word2vec
word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (-82.68%)
Mutual labels:  word2vec
Word2Vec-iOS
Word2Vec iOS port
Stars: ✭ 23 (-81.89%)
Mutual labels:  word2vec
kar98k public
pwn & ctf tools for windows
Stars: ✭ 24 (-81.1%)
Mutual labels:  binary-analysis
russe
RUSSE: Russian Semantic Evaluation.
Stars: ✭ 11 (-91.34%)
Mutual labels:  word2vec
py3cw
Unofficial wrapper for the 3Commas API written in Python
Stars: ✭ 88 (-30.71%)
Mutual labels:  unofficial
Recommendation-based-on-sequence-
Recommendation based on sequence
Stars: ✭ 23 (-81.89%)
Mutual labels:  word2vec
binary viewer
A binary visualization tool to aid with reverse engineering and malware detection similar to Cantor.Dust
Stars: ✭ 55 (-56.69%)
Mutual labels:  binary-analysis
Vaaku2Vec
Language Modeling and Text Classification in Malayalam Language using ULMFiT
Stars: ✭ 68 (-46.46%)
Mutual labels:  word2vec
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (-64.57%)
Mutual labels:  word2vec
two-stream-cnn
A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data
Stars: ✭ 24 (-81.1%)
Mutual labels:  word2vec
figma-plus-advanced-rename-plugin
A better and more powerful batch rename plugin for Figma with a dozen of options
Stars: ✭ 28 (-77.95%)
Mutual labels:  unofficial
GE-FSG
Graph Embedding via Frequent Subgraphs
Stars: ✭ 39 (-69.29%)
Mutual labels:  word2vec
Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (-22.05%)
Mutual labels:  word2vec
sigkit
Function signature matching and signature generation plugin for Binary Ninja
Stars: ✭ 38 (-70.08%)
Mutual labels:  binary-analysis
word2vec-movies
Bag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (-57.48%)
Mutual labels:  word2vec
hyperstar
Hyperstar: Negative Sampling Improves Hypernymy Extraction Based on Projection Learning.
Stars: ✭ 24 (-81.1%)
Mutual labels:  word2vec
UnofficialCrusaderPatch
Unofficial balancing patch installer for Stronghold Crusader 1
Stars: ✭ 373 (+193.7%)
Mutual labels:  unofficial

asm2vec

This is an unofficial implementation of the asm2vec model as a standalone python package. The details of the model can be found in the original paper: (sp'19) Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization

Requirements

This implementation is written in python 3.7 and it's recommended to use python 3.7+ as well. The only dependency of this package is numpy which can be installed as follows:

python3 -m pip install numpy

How to use

Import

To install the package, execute the following commands:

git clone https://github.com/lancern/asm2vec.git

Add the following line to the .bashrc file to add asm2vec to your python interpreter's search path for external packages:

export PYTHONPATH="path/to/asm2vec:$PYTHONPATH"

Replace path/to/asm2vec with the directory you clone asm2vec into. Then execute the following commands to update PYTHONPATH:

source ~/.bashrc

You can also add the following code snippets to your python source code referring asm2vec to guide python interpreter finding the package successfully:

import sys
sys.path.append('path/to/asm2vec')

In your python code, use the following import statement to import this package:

import asm2vec.<module-name>

Define CFGs And Training

You have 2 approaches to define the binary program that will be sent to the asm2vec model. The first approach is to build the CFG manually, as shown below:

from asm2vec.asm import BasicBlock
from asm2vec.asm import Function
from asm2vec.asm import parse_instruction

block1 = BasicBlock()
block1.add_instruction(parse_instruction('mov eax, ebx'))
block1.add_instruction(parse_instruction('jmp _loc'))

block2 = BasicBlock()
block2.add_instruction(parse_instruction('xor eax, eax'))
block2.add_instruction(parse_instruction('ret'))

block1.add_successor(block2)

block3 = BasicBlock()
block3.add_instruction(parse_instruction('sub eax, [ebp]'))

f1 = Function(block1, 'some_func')
f2 = Function(block3, 'another_func')

# block4 is ignore here for clarity
f3 = Function(block4, 'estimate_func')

And then you can train a model with the following code:

from asm2vec.model import Asm2Vec

model = Asm2Vec(d=200)
train_repo = model.make_function_repo([f1, f2, f3])
model.train(train_repo)

The second approach is using the parse module provided by asm2vec to build CFGs automatically from an assembly code source file:

from asm2vec.parse import parse_fp

with open('source.asm', 'r') as fp:
    funcs = parse_fp(fp)

And then you can train a model with the following code:

from asm2vec.model import Asm2Vec

model = Asm2Vec(d=200)
train_repo = model.make_function_repo(funcs)
model.train(train_repo)

Estimation

You can use the asm2vec.model.Asm2Vec.to_vec method to convert a function into its vector representation.

Serialization

The implementation support serialization on many of its internal data structures so that you can serialize the internal state of a trained model into disk for future use.

You can serialize two data structures to primitive data: the function repository and the model memento.

To be finished.

Hyper Parameters

The constructor of asm2vec.model.Asm2Vec class accepts some keyword arguments as hyper parameters of the model. The following table lists all the hyper parameters available:

Parameter Name Type Meaning Default Value
d int The dimention of the vectors for tokens. 200
initial_alpha float The initial learning rate. 0.05
alpha_update_interval int How many tokens can be processed before changing the learning rate? 10000
rnd_walks int How many random walks to perform to sequentialize a function? 3
neg_samples int How many samples to take during negative sampling? 25
iteration int How many iterations to perform? (This parameter is reserved for future use and is not implemented now) 1
jobs int How many tasks to execute concurrently during training? 4

Notes

For simplicity, the Selective Callee Expansion is not implemented in this early implementation. You have to do it manually before sending CFG into asm2vec .

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].