All Projects → AxelGoetz → Website Fingerprinting

AxelGoetz / Website Fingerprinting

Licence: apache-2.0
Automatic Feature Generation for Website Fingerprinting

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Website Fingerprinting

Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+1940%)
Mutual labels:  sequence-to-sequence
Nmt Keras
Neural Machine Translation with Keras
Stars: ✭ 501 (+2405%)
Mutual labels:  sequence-to-sequence
Nematus
Open-Source Neural Machine Translation in Tensorflow
Stars: ✭ 730 (+3550%)
Mutual labels:  sequence-to-sequence
Autoencoders
Torch implementations of various types of autoencoders
Stars: ✭ 421 (+2005%)
Mutual labels:  autoencoder
Dancenet
DanceNet -💃💃Dance generator using Autoencoder, LSTM and Mixture Density Network. (Keras)
Stars: ✭ 469 (+2245%)
Mutual labels:  autoencoder
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+2610%)
Mutual labels:  sequence-to-sequence
Neuralmonkey
An open-source tool for sequence learning in NLP built on TensorFlow.
Stars: ✭ 400 (+1900%)
Mutual labels:  sequence-to-sequence
Advanced Deep Learning With Keras
Advanced Deep Learning with Keras, published by Packt
Stars: ✭ 917 (+4485%)
Mutual labels:  autoencoder
Tensorflow Book
Accompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.
Stars: ✭ 4,448 (+22140%)
Mutual labels:  autoencoder
Keras Idiomatic Programmer
Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework
Stars: ✭ 720 (+3500%)
Mutual labels:  autoencoder
Tensorflow Mnist Vae
Tensorflow implementation of variational auto-encoder for MNIST
Stars: ✭ 422 (+2010%)
Mutual labels:  autoencoder
Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+25315%)
Mutual labels:  autoencoder
Ad examples
A collection of anomaly detection methods (iid/point-based, graph and time series) including active learning for anomaly detection/discovery, bayesian rule-mining, description for diversity/explanation/interpretability. Analysis of incorporating label feedback with ensemble and tree-based detectors. Includes adversarial attacks with Graph Convolutional Network.
Stars: ✭ 641 (+3105%)
Mutual labels:  autoencoder
Tensorflow Tutorial
Tensorflow tutorial from basic to hard, 莫烦Python 中文AI教学
Stars: ✭ 4,122 (+20510%)
Mutual labels:  autoencoder
Neurec
Next RecSys Library
Stars: ✭ 731 (+3555%)
Mutual labels:  autoencoder
Deepsvg
[NeurIPS 2020] Official code for the paper "DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation". Includes a PyTorch library for deep learning with SVG data.
Stars: ✭ 403 (+1915%)
Mutual labels:  autoencoder
Chatlearner
A chatbot implemented in TensorFlow based on the seq2seq model, with certain rules integrated.
Stars: ✭ 528 (+2540%)
Mutual labels:  sequence-to-sequence
Concise Ipython Notebooks For Deep Learning
Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Stars: ✭ 23 (+15%)
Mutual labels:  autoencoder
Tensorflow Tutorial
TensorFlow and Deep Learning Tutorials
Stars: ✭ 748 (+3640%)
Mutual labels:  autoencoder
Cluener2020
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (+3345%)
Mutual labels:  sequence-to-sequence

Website Fingerprinting Build Status versioneye codecov

There have been a large variety of website fingerprinting attacks. However most of them require a very tedious process of feature-selection.

There have been some attempts to use autoencoders to solve this problem however those NN require a fixed-length input to begin with. Hence, you need to perform a feature selection process to begin with.

This project therefore examines the use of RNN's to perform a feature selection process since they can be unrolled to a custom length for each trace.

Sequence-to-Sequence Model

Essentially, this project implements a sequence-to-sequence model, which has previously mainly been used for natural language processing (NLP) and to perform translation tasks.

Sequence-to-sequence model image

The model consists of two different RNNs, a encoder and a decoder. First the encoder runs on the input and returns a thought vector, which can be thought of as our features. Next, the decoder uses the thought vector as its initial state and uses it to construct a new sentence as can be seen in the image above.

So if we train a model on a copy task, where it tries to reconstruct the original trace from the thought vector, it has learned to construct a fixed-length representation of a trace, that contains all of the necessary information to represent it.

Therefore the thought vector can be used as features for other machine learning solutions.

Testing the Outcome

To test if our feature selection process has been effective, we need to compare it with the accuracy of existing hand-picked features. We will do this by training existing models used for website fingerprinting attacks using both the hand-picked features and the automatically generated ones.

Then we will compare both using a wide variety of metrics.

Given time constraints, we will only test our automatically generated features on a small set of (influential) existing models:

  • k-fingerprinting attack (Random Forest) [1]
  • Website Fingerprinting in Onion Routing Based Anonymization Networks (SVC) [2]
  • Website Fingerprinting at Internet Scale (SVM with RBF kernel) [3]
  • Effective Attacks and Provable Defenses for Website Fingerprinting (kNN) [4]

More information on the hand-picked features can be found here.

Also, some unit tests have been written to test some of the data preprocessing. All of those can be run by using:

python -m unittest discover

Next, to generate a new coverage report, we need to install coverage and run:

pip install coverage # Outside of your virtual environment
coverage run --omit="/usr/local/*" -m unittest discover # Inside the virtual environment

Running the Code

Since some of the source files contain unicode characters, you need to run all of the code with python3.

The seq2seq model can be run by using:

python feature_generation/run_model.py

To extract all of the hand-picked features from the data, first update the relative path in the feature_extraction.py file to the data.

Next, run:

python feature_extraction/feature_extraction.py

This script will create a new directory for every model within your data directory with the features inside of {webpage_index}-{sample_number}.cellf files.

Finally, to run all of the models, you can run the script:

python run_models/run_model.py

with the appropriate parameters.

To see how to run the system, checkout the user_manual.md file.

Installation

The seq2seq model mainly relies on tensorflow whilst we use sk-learn for the primitive machine learning tasks.

These are a set of simple instructions to get your environment up and running.

First you will need to install a python virtual environment using:

pip install virtualenv

Make sure you are then in the main directory of this project and run:

virtualenv venv
source venv/bin/activate

to activate the virtual environment. Once you are in this environment, you will need to install the appropriate packages by running:

pip install -r requirements.txt

Some of the code is also written in go, which requires an installation. This depends on your system but if you're running macOS, I recommend using homebrew:

brew install go

GPU Setup

If you plan to use the GPU support, you will also need to run some additional instructions, all of which can be found here and you will need to install the GPU enabled tensorflow instead.

Data

All of the data used for our experiments are from two datasets, called GRESCHBACH and WANG14.

File Structure

The project is structured as follows:

.
├── attacks - The source code for the existing attacks
├── data
│   └── cells - Contains all of the raw traces. Consists of a list of pairs (packetSize, 1 if outgoing else -1)
├── feature_extraction - All of the source code to extract features for different models from the raw traces
├── feature_generation - Used to automatically extract features from the raw traces
├── report - Several different reports but the most important one is the final report.
├── tests - Contains all of the unit tests
├── static - Any static resources used for either the README or the report.
├── .gitignore
├── .tavis.yml
├── README.md
└── requirements.txt

References

[1] Hayes, Jamie, and George Danezis. "k-fingerprinting: A robust scalable website fingerprinting technique." arXiv preprint arXiv:1509.00789 (2016).

[2] Panchenko, Andriy, Lukas Niessen, Andreas Zinnen, and Thomas Engel. "Website fingerprinting in onion routing based anonymization networks." In Proceedings of the 10th annual ACM workshop on Privacy in the electronic society, pp. 103-114. ACM, 2011.

[3] Panchenko, Andriy, Fabian Lanze, Andreas Zinnen, Martin Henze, Jan Pennekamp, Klaus Wehrle, and Thomas Engel. "Website fingerprinting at internet scale." In Network & Distributed System Security Symposium (NDSS). IEEE Computer Society. 2016.

[4] Wang, Tao, Xiang Cai, Rishab Nithyanand, Rob Johnson, and Ian Goldberg. "Effective Attacks and Provable Defenses for Website Fingerprinting." In USENIX Security, pp. 143-157. 2014.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].