All Projects → sanjit-bhat → Var-CNN

sanjit-bhat / Var-CNN

Licence: MIT License
Code for the paper "Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning" (PETS 2019)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Var-CNN

karbowanec
Karbo (Karbovanets) - Digital Exchange Medium - cryptocurrency made in Ukraine, CryptoNote protocol implementation.
Stars: ✭ 102 (+251.72%)
Mutual labels:  anonymity
dystopia
Anonymity on the Internet by Transparent way.
Stars: ✭ 97 (+234.48%)
Mutual labels:  anonymity
prox
🙈 Share anonymous confessions in Slack
Stars: ✭ 28 (-3.45%)
Mutual labels:  anonymity
i2pd-android
i2pd for Android
Stars: ✭ 66 (+127.59%)
Mutual labels:  anonymity
alternative-frontends
🔐🌐 Privacy-respecting web frontends for popular services
Stars: ✭ 821 (+2731.03%)
Mutual labels:  anonymity
zen archived
TLS integration and more!
Stars: ✭ 133 (+358.62%)
Mutual labels:  anonymity
i2pchat
🌀 i2pchat's old repo. This repo is deprecated in favor of https://github.com/i2pchat/i2pchat which is now the main repo.
Stars: ✭ 23 (-20.69%)
Mutual labels:  anonymity
youtube rss
A YouTube-client for managing subscriptions and watching videos anonymously over Tor without a Google account.
Stars: ✭ 49 (+68.97%)
Mutual labels:  anonymity
soxy-driver
A docker networking driver that transparently tunnels docker containers TCP traffic through a proxy
Stars: ✭ 25 (-13.79%)
Mutual labels:  anonymity
orjail
a more secure way to force programs to exclusively use tor network
Stars: ✭ 136 (+368.97%)
Mutual labels:  anonymity
kali-whoami
Whoami provides enhanced privacy, anonymity for Debian and Arch based linux distributions
Stars: ✭ 1,424 (+4810.34%)
Mutual labels:  anonymity
tordam
A library for peer discovery inside the Tor network
Stars: ✭ 13 (-55.17%)
Mutual labels:  anonymity
privacy-preserving-primitives
primitives and protocols for implementing privacy preserving networks
Stars: ✭ 14 (-51.72%)
Mutual labels:  anonymity
awesome-secure-messaging
A curated collection of links for secure messaging.
Stars: ✭ 29 (+0%)
Mutual labels:  anonymity
icebreaker
Web app that allows students to ask real-time, anonymous questions during class
Stars: ✭ 16 (-44.83%)
Mutual labels:  anonymity
tor-ip-changer
request new identity every X seconds interval using TOR client
Stars: ✭ 233 (+703.45%)
Mutual labels:  anonymity
Pseudonymity-Guide
How to securely create and operate a pseudonymous identity.
Stars: ✭ 64 (+120.69%)
Mutual labels:  anonymity
darknet.py
darknet.py is a network application with no dependencies other than Python and Tor, useful to anonymize the traffic of linux servers and workstations.
Stars: ✭ 71 (+144.83%)
Mutual labels:  anonymity
anon.land
open source Imageboard just like was Voxed.net
Stars: ✭ 16 (-44.83%)
Mutual labels:  anonymity
libanonvpn
Library for TUN and TAP devices over I2P in Go Applications
Stars: ✭ 35 (+20.69%)
Mutual labels:  anonymity

Var-CNN

This repository contains code for the following paper:

Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning (PETS 2019, link to presentation)

Sanjit Bhat, David Lu, Albert Kwon, and Srini Devadas.

Dependencies

  1. Ensure that you have a functioning machine with an NVIDIA GPU inside it. The model will take significantly longer to run on a CPU.
  2. Make sure you have the TensorFlow/Keras deep learning stack installed. For detailed instructions, see this link under the "Software Setup" section. For our experiments, we used Ubuntu 16.04 LTS, CUDA 8.0, CuDNN v6, and TensorFlow 1.3.0 as a backend for Keras 2.0.8.
  3. To install all required Python packages, simply issue the following command: pip install -r requirements.txt.

Control Flow

The first step in running our model is to place the adequate amount of raw packet sequences in the data_dir folder. Each monitored website needs to have at least num_mon_inst_train + num_mon_inst_test instances, and there needs to be at least num_unmon_sites_train + num_unmon_sites_test unmonitored sites.

If you use the Wang et al. data format (i.e., each line representing a new packet with the relative time and direction separated by a space), then we have that supported in wang_to_varcnn.py. Otherwise, you will need to modify wang_to_varcnn.py, or you can write your own glue code to move to the Wang et al. format.

After setting up the data and specifying the parameters in config.json, you can run all parts of our code just by issuing a python run_model.py command. After that, our programs will be called in the following sequence:

  1. wang_to_varcnn.py: This parses the data_dir folder; extracts direction, time, metadata, and labels; and stores all the monitored and unmonitored traces in all_closed_world.npz and all_open_world.npz, respectively, in the data_dir folder.
  2. preprocess_data.py: This uses the data in all_closed_world.npz to pick a random num_mon_inst_train and num_mon_inst_test instances of each of the num_mon_sites monitored sites for the training and test sets, respectively. It also performs a similar random split for the unmonitored sites (using the all_open_world.npz file) and preprocesses all of these traces to scale the metadata, change to inter-packet timing, etc. Finally, it saves the direction data, time data, metadata, and labels to .h5 files to conserve RAM during the training process.
  3. run_model.py: This is the main file that first calls the prior two files. Next, it loads the model architectures from either var_cnn.py or df.py, trains the models, saves their predictions, and calls evaluate.py for evaluation.
  4. During training, data_generator.py generates new batches of data in parallel. Since large datasets can contain hundreds of thousands of traces, data_generator.py uses .h5 files to access the traces for one batch without loading the entire dataset into memory.
  5. evaluate.py: This first calculates metrics for each of the in-training combinations specified in mixture. Then, it averages each of their predictions together and reports metrics for the overall out-of-training ensemble. It saves all metrics to the job_result.json file.

Parameters

config.json provides the configuration settings to all the other programs. We describe its parameters in further detail below:

  1. data_dir: This relative path provides the location of the "raw" packet sequences (e.g., the "0", "1", "0-0", "0-1" files in Wang et al.'s dataset). Also, it later stores the all_closed_world.npz and all_open_world.npz files generated by wang_to_varcnn.py and the .h5 data files generated by preprocess_data.py.
  2. predictions_dir: After training the model, run_model.py generates predictions for the test set and stores them in this directory. evaluate.py later uses them to calculate test metrics.
  3. num_mon_sites: The number of monitored websites. Each of the num_mon_sites sites in data_dir must have at least num_mon_inst_train + num_mon_inst_test instances.
  4. num_mon_inst_train: The number of monitored instances used for training.
  5. num_mon_inst_test: The number of monitored instances used for testing.
  6. num_unmon_sites_train: The number of unmonitored sites used for training. Each site has one instance.
  7. num_unmon_sites_test: The number of unmonitored sites used for testing. Each site has one instance, and these unmonitored websites are different from those used for training.
  8. model_name: The model name. Either "var-cnn" or "df".
  9. batch_size: The batch size used during training. For Var-CNN, we found that a batch size of 50 works well. The recommended batch size for DF is 128.
  10. mixture: The mixture of ensembles used during training and evaluation. Each of the inner arrays represent models combined in-training. run_model will save the predictions for every such in-training combination. Subsequently, evaluate_ensemble will report metrics for these individual models as well as the overall out-of-training ensemble (i.e., the average of the individual predictions). Note: this functionality only works with Var-CNN (in fact, deep fingerprinting will automatically default to using [["dir"]]). Also, do not use two in-training combinations with the same components as their prediction files will be overwritten. Default: [["dir", "metadata"], ["time", "metadata"]] for Var-CNN.
  11. seq_length: The length of the input sequence fed into the CNN (default: 5000). We use this parameter right from the start when scraping the raw data.
  12. df_epochs: The number of epochs used to train DF (default: 30).
  13. var_cnn_max_epochs: The maximum number of epochs used to train Var-CNN (default: 150). The EarlyStopping callback often cuts off training much sooner -- whenever validation accuracy fails to increase.
  14. var_cnn_base_patience: The "patience" (i.e., number of epochs of no validation accuracy improvement) until we decrease the learning rate of Var-CNN and stop training (default: 5). We implement this functionality in the ReduceLROnPlateau and EarlyStopping callbacks inside var_cnn.py.
  15. dir_dilations: Whether to use dilations with the direction ResNet (default: true).
  16. time_dilations: Whether to use dilations with the time ResNet (default: true).
  17. inter_time: Whether to use the inter-packet time (i.e., time between two packets) or the relative time (i.e., time from the first packet) for timing data (default: true, i.e., we do use inter-packet time).
  18. scale_metadata. Whether to scale metadata to zero mean and unit variance (default: true).

Citation

If you find Var-CNN useful in your research, please consider citing:

@article{bhat19,
  title={{Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning}},
  author={Bhat, Sanjit and Lu, David and Kwon, Albert and Devadas, Srinivas},
  journal={Proceedings on Privacy Enhancing Technologies},
  volume={4},
  pages={292--310},
  year={2019}
}   

Contact

sanjit.bhat (at) gmail.com

davidboxboro (at) gmail.com

kwonal (at) mit.edu

devadas (at) mit.edu

Any discussions, suggestions, and questions are welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].