All Projects → wisepythagoras → website-fingerprinting

wisepythagoras / website-fingerprinting

Licence: MIT license
Deanonymizing Tor or VPN users with website fingerprinting and machine learning.

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to website-fingerprinting

Pcapxray
❄️ PcapXray - A Network Forensics Tool - To visualize a Packet Capture offline as a Network Diagram including device identification, highlight important communication and file extraction
Stars: ✭ 1,096 (+1757.63%)
Mutual labels:  packets, tor, traffic
deeponion
Official Source Repo for DeepOnion - Anonymous Cryptocurrency on TOR
Stars: ✭ 60 (+1.69%)
Mutual labels:  tor, anonymous
Crypto-Wallet
Open source SHA-512 loginless bitcoin wallet
Stars: ✭ 24 (-59.32%)
Mutual labels:  tor, anonymous
Toriptables2
Tor Iptables script is an anonymizer that sets up iptables and tor to route all services and traffic including DNS through the Tor network.
Stars: ✭ 287 (+386.44%)
Mutual labels:  tor, traffic
Packet Agent
A toolset for network packet capture in Cloud/Kubernetes and Virtualized environment.
Stars: ✭ 419 (+610.17%)
Mutual labels:  traffic, capture
privcy
Official Repository for PRiVCY Coin $PRiV
Stars: ✭ 26 (-55.93%)
Mutual labels:  tor, anonymous
tornote
Self-destructing notes on Go with tiny secured client-side
Stars: ✭ 28 (-52.54%)
Mutual labels:  tor, anonymous
WifiIndoorPositioning
🚀 Evaluation of Location of the device using RSSI values of Access Points and Reference point which are made from Wi-Fi readings
Stars: ✭ 83 (+40.68%)
Mutual labels:  fingerprinting, fingerprinting-algorithm
Ffck
🦊 & 🧅 hardening
Stars: ✭ 72 (+22.03%)
Mutual labels:  tor, fingerprinting
sniffer
🤒 A modern alternative network traffic sniffer.
Stars: ✭ 428 (+625.42%)
Mutual labels:  packets, traffic
Pirsch
Pirsch is a drop-in, server-side, no-cookie, and privacy-focused analytics solution for Go.
Stars: ✭ 257 (+335.59%)
Mutual labels:  traffic, fingerprinting
Emlearn
Machine Learning inference engine for Microcontrollers and Embedded devices
Stars: ✭ 154 (+161.02%)
Mutual labels:  classifier, scikit-learn
youtube-or-pornhub
Service identification on ciphered traffic.
Stars: ✭ 26 (-55.93%)
Mutual labels:  traffic, capture
Tor-IP-Addresses
Hourly checked and updated list of IP Addresses of Tor and Tor Exit Nodes
Stars: ✭ 182 (+208.47%)
Mutual labels:  tor, anonymous
4chan-nodejs
A 4chan clone written in node.js
Stars: ✭ 36 (-38.98%)
Mutual labels:  anonymous-users, anonymous
anon-hotspot
On demand Debian Linux (Tor) Hotspot setup tool
Stars: ✭ 34 (-42.37%)
Mutual labels:  tor, anonymous
Multitor
Create multiple TOR instances with a load-balancing.
Stars: ✭ 624 (+957.63%)
Mutual labels:  tor, traffic
Actionai
custom human activity recognition modules by pose estimation and cascaded inference using sklearn API
Stars: ✭ 404 (+584.75%)
Mutual labels:  classifier, scikit-learn
FPStalker
Repo of code for FPStalker article
Stars: ✭ 24 (-59.32%)
Mutual labels:  fingerprinting, fingerprinting-algorithm
alias-wallet
Official Alias source code repository
Stars: ✭ 5 (-91.53%)
Mutual labels:  tor

Website Fingerprinting

Website fingerprinting is a method of Tor or VPN packet inspection that aims to collect enough features and information from individual sessions that could aid in identifying the activity of anonymized users.

Introduction

For this experiment, Tor is required. It can be installed by running the following commands:

# For Debian or Ubuntu
sudo apt install tor lynx

# For Fedora
sudo yum install tor lynx

# For ArchLinux
sudo pacman -S tor torsocks lynx

By installing Tor we also get a program called torsocks; this program will be used to redirect traffic of common programs through the Tor network. For example, it can be run as follows:

# SSH through Tor.
torsocks ssh [email protected]

# CUrl through Tor.
torsocks curl -L http://httpbin.org/ip

# Etc...

Required Python 3 Modules

pip install sklearn dpkt joblib

Data Collection

For the data collection process two terminal windows in a side-by-side orientation are required, as this process is fairly manual. Also, it's advised to collect the fingerprints in a VM, in order to avoid caputring any unintended traffic. To listen on traffic there exists a script, namely capture.sh, which should be run in one of the terminals:

./pcaps/capture.sh duckduckgo.com

Once the listener is capturing traffic, on the next terminal run:

torsocks lynx https://duckduckgo.com

Once the website has finished loading, the capture process needs to be killed, along with the browser session (by hitting the q key twice). The process should be repeated several times for each web page so that there is enough data.

Machine Learning

Scikit Learn was used to write a k Nearest Neighbors classifier, that would read the pcap files, as specified in the config.json file. config.json can be changed according to which webpages were targeted for training. The training script is gather_and_train.py.

Scikit Learn kNN

Classifying Unknown Traffic

# python predict.py [packet to classify]
  python predict.py xyz.pcap

Once the training is done, and the classifier-nb.dmp is created, the predict.py script can be run with the pcap file as the sole argument. The script will load the classifier and attempt to identify which web page the traffic originated from.

It is worth noting that from each sample only the first 40 packets will be used to train a usable model and to run through the resulting classifier.

Visualizing the patterns

As can be seen in the screenshot above, the patterns of the packets of each website can be seen clearly on a 3D scale. The classifier visualizes the data in a similar way and gives us the most accurate result.

An interactive version of this graph can be found in the graphs folder.

Limitations and Disclaimers

This setup was created in order to research the topic of website fingerprinting and how easy it is to attempt to deanonymize users over Tor or VPNs. Traffic was captured and identified in a private setting and for purely academic purposes; the use of this source code is intended for those reasons only.

Traffic is never "clean", as the assumption was - for simplicity - in this research. However, if an entity has enough resources, the desired anonymized traffic can be isolated and fed into this simple classifier. This means that it is entirely possible to use a method like this to compromise anonymized users.

References

  1. Wang, T. and Goldberg, I. (2017). Website Fingerprinting. [online] Cse.ust.hk. Available at: https://www.cse.ust.hk/~taow/wf/.
  2. Wang, T. and Goldberg, I. (2017). Improved Website Fingerprinting on Tor. Cheriton School of Computer Science. Available at: http://www.cypherpunks.ca/~iang/pubs/webfingerprint-wpes.pdf
  3. Wang, T. (2015). Website Fingerprinting: Attacks and Defenses. University of Waterloo. Available at: https://uwspace.uwaterloo.ca/bitstream/handle/10012/10123/Wang_Tao.pdf
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].